date:20180522

[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-22 Thread Yiran Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: YARN-8041.005.patch

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch, YARN-8041.005.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8337) Deadlock Federation Router

2018-05-22 Thread Yiran Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483763#comment-16483763
 ] 

Yiran Wu commented on YARN-8337:


Thanks [~jianchao jia] , I have a question.

 The "INSERT IGNORE " will be ignore any error. 
 Do we need to ensure the data inserted successfully or capture errors and 
retry insert it?

> Deadlock Federation Router
> --
>
> Key: YARN-8337
> URL: https://issues.apache.org/jira/browse/YARN-8337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Jianchao Jia
>Priority: Major
> Attachments: YARN-8337.001.patch
>
>
> We use mysql innodb as the state store engine,in router log we found dead 
> lock error like below:
> {code:java}
> [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
> Unable to insert the newly generated application 
> application_1526295230627_127402
> com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
> found when trying to get lock; try restarting transaction
> at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
> at com.mysql.jdbc.Util.getInstance(Util.java:408)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
> at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
> at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
> at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
> at 
> com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
> at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
> at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
> at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
> {code}
> Use "show engine innodb status;" command to find what happens 
> {code:java}
> 2018-05-21 15:41:40 7f4685870700
> *** (1) TRANSACTION:
> TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 
> lock_mode X locks gap before rec insert intention waiting
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info 
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 0ba5f32d; asc -;;
> 2: len 7; hex dd00280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** (2) TRANSACTION:
> TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (2) HOLDS THE LOCK(S):
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
>

[jira] [Commented] (YARN-8297) Incorrect ATS Url used for Wire encrypted cluster

2018-05-22 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483819#comment-16483819
 ] 

Sunil Govindan commented on YARN-8297:
--

[~rohithsharma], could u pls help to check the patch.

> Incorrect ATS Url used for Wire encrypted cluster
> -
>
> Key: YARN-8297
> URL: https://issues.apache.org/jira/browse/YARN-8297
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Sunil Govindan
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8297-addendum.patch, YARN-8297.001.patch
>
>
> "Service" page uses incorrect web url for ATS in wire encrypted env. For ATS 
> urls, it uses https protocol with http port.
> This issue causes all ATS call to fail and UI does not display component 
> details.
> url used: 
> https://xxx:8198/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320
> expected url : 
> https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483838#comment-16483838
 ] 

genericqa commented on YARN-8041:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 53s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 10 new 
+ 18 unchanged - 0 fixed = 28 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
5s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 67m  
8s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 23s{color} 
| {color:red} hadoop-yarn-server-router in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.router.webapp.TestFederationInterceptorREST |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8041 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924505/YARN-8041.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ee43e60accb6

[jira] [Commented] (YARN-6919) Add default volume mount list

2018-05-22 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483859#comment-16483859
 ] 

Shane Kumpf commented on YARN-6919:
---

Thanks for the patch [~ebadger]! I tested this out and it is working as 
intended. +1 lgtm, I'll commit this later today if there are no objections.

> Add default volume mount list
> -
>
> Key: YARN-6919
> URL: https://issues.apache.org/jira/browse/YARN-6919
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6919.001.patch, YARN-6919.002.patch
>
>
> Piggybacking on YARN-5534, we should create a default list that bind mounts 
> selected volumes into all docker containers. This list will be empty by 
> default 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded

2018-05-22 Thread Gergo Repas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483871#comment-16483871
 ] 

Gergo Repas commented on YARN-8273:
---

[~rkanter] Thanks for the review! Yes, indeed LogAggregationDFSException can be 
a checked exception (and a subclass of YarnException), I've updated the patch.

> Log aggregation does not warn if HDFS quota in target directory is exceeded
> ---
>
> Key: YARN-8273
> URL: https://issues.apache.org/jira/browse/YARN-8273
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-8273.000.patch, YARN-8273.001.patch, 
> YARN-8273.002.patch, YARN-8273.003.patch, YARN-8273.004.patch, 
> YARN-8273.005.patch, YARN-8273.006.patch
>
>
> It appears that if an HDFS space quota is set on a target directory for log 
> aggregation and the quota is already exceeded when log aggregation is 
> attempted, zero-byte log files will be written to the HDFS directory, however 
> NodeManager logs do not reflect a failure to write the files successfully 
> (i.e. there are no ERROR or WARN messages to this effect).
> An improvement may be worth investigating to alert users to this scenario, as 
> otherwise logs for a YARN application may be missing both on HDFS and locally 
> (after local log cleanup is done) and the user may not otherwise be informed.
> Steps to reproduce:
> * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB)
> * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full
> * Run a Spark or MR job in the cluster
> * Observe that zero byte files are written to HDFS after job completion
> * Observe that YARN container logs are also not present on the NM hosts (or 
> are deleted after yarn.nodemanager.delete.debug-delay-sec)
> * Observe that no ERROR or WARN messages appear to be logged in the NM role 
> log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()

2018-05-22 Thread Yiran Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8173:
---
Attachment: YARN-8041.004.patch

> [Router] Implement missing FederationClientInterceptor#getApplications()
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8041.004.patch, YARN-8173.001.patch, 
> YARN-8173.002.patch, YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()

2018-05-22 Thread Yiran Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8173:
---
Attachment: (was: YARN-8041.004.patch)

> [Router] Implement missing FederationClientInterceptor#getApplications()
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8297) Incorrect ATS Url used for Wire encrypted cluster

2018-05-22 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8297:
-
Attachment: YARN-8297-addendum.patch

> Incorrect ATS Url used for Wire encrypted cluster
> -
>
> Key: YARN-8297
> URL: https://issues.apache.org/jira/browse/YARN-8297
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Sunil Govindan
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8297-addendum.patch, YARN-8297.001.patch
>
>
> "Service" page uses incorrect web url for ATS in wire encrypted env. For ATS 
> urls, it uses https protocol with http port.
> This issue causes all ATS call to fail and UI does not display component 
> details.
> url used: 
> https://xxx:8198/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320
> expected url : 
> https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-8297) Incorrect ATS Url used for Wire encrypted cluster

2018-05-22 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan reopened YARN-8297:
--

This issue is not cleanly handled. Needed an addendum patch

> Incorrect ATS Url used for Wire encrypted cluster
> -
>
> Key: YARN-8297
> URL: https://issues.apache.org/jira/browse/YARN-8297
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Sunil Govindan
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8297-addendum.patch, YARN-8297.001.patch
>
>
> "Service" page uses incorrect web url for ATS in wire encrypted env. For ATS 
> urls, it uses https protocol with http port.
> This issue causes all ATS call to fail and UI does not display component 
> details.
> url used: 
> https://xxx:8198/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320
> expected url : 
> https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8297) Incorrect ATS Url used for Wire encrypted cluster

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483886#comment-16483886
 ] 

genericqa commented on YARN-8297:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
36m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8297 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924526/YARN-8297-addendum.patch
 |
| Optional Tests |  asflicense  shadedclient  |
| uname | Linux 6f1bf834c6e8 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 57c2feb |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 312 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20820/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Incorrect ATS Url used for Wire encrypted cluster
> -
>
> Key: YARN-8297
> URL: https://issues.apache.org/jira/browse/YARN-8297
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Sunil Govindan
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8297-addendum.patch, YARN-8297.001.patch
>
>
> "Service" page uses incorrect web url for ATS in wire encrypted env. For ATS 
> urls, it uses https protocol with http port.
> This issue causes all ATS call to fail and UI does not display component 
> details.
> url used: 
> https://xxx:8198/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320
> expected url : 
> https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()

2018-05-22 Thread Yiran Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8173:
---
Attachment: YARN-8041.004.patch

> [Router] Implement missing FederationClientInterceptor#getApplications()
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8337) Deadlock Federation Router

2018-05-22 Thread Jianchao Jia (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483773#comment-16483773
 ] 

Jianchao Jia edited comment on YARN-8337 at 5/22/18 10:41 AM:
--

[~yiran] Thanks for your comment.

If the record exists in the table,  "row_count()" will return zero,otherwise 
will return one。

In SQLFederationStateStore.java，have different treatment according to different 
return values。

 

[~giovanni.fumarola] can you review this ,or gave other advice?


was (Author: jianchao jia):
[~yiran] Thanks for you comment.

If the record exists in the table,  "row_count()" will return zero,otherwise 
will return one。

In SQLFederationStateStore.java，have different treatment according to different 
return values。

 

[~giovanni.fumarola] can you review this ,or gave other advice?

> Deadlock Federation Router
> --
>
> Key: YARN-8337
> URL: https://issues.apache.org/jira/browse/YARN-8337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Jianchao Jia
>Priority: Major
> Attachments: YARN-8337.001.patch
>
>
> We use mysql innodb as the state store engine,in router log we found dead 
> lock error like below:
> {code:java}
> [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
> Unable to insert the newly generated application 
> application_1526295230627_127402
> com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
> found when trying to get lock; try restarting transaction
> at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
> at com.mysql.jdbc.Util.getInstance(Util.java:408)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
> at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
> at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
> at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
> at 
> com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
> at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
> at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
> at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
> {code}
> Use "show engine innodb status;" command to find what happens 
> {code:java}
> 2018-05-21 15:41:40 7f4685870700
> *** (1) TRANSACTION:
> TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 
> lock_mode X locks gap before rec insert intention waiting
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info 
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 0ba5f32d; asc -;;
> 2: len 7; hex dd00280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** (2) TRANSACTION:
> TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL

[jira] [Commented] (YARN-8337) Deadlock Federation Router

2018-05-22 Thread Jianchao Jia (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483773#comment-16483773
 ] 

Jianchao Jia commented on YARN-8337:


[~yiran] Thanks for you comment.

If the record exists in the table,  "row_count()" will return zero,otherwise 
will return one。

In SQLFederationStateStore.java，have different treatment according to different 
return values。

 

[~giovanni.fumarola] can you review this ,or gave other advice?

> Deadlock Federation Router
> --
>
> Key: YARN-8337
> URL: https://issues.apache.org/jira/browse/YARN-8337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Jianchao Jia
>Priority: Major
> Attachments: YARN-8337.001.patch
>
>
> We use mysql innodb as the state store engine,in router log we found dead 
> lock error like below:
> {code:java}
> [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
> Unable to insert the newly generated application 
> application_1526295230627_127402
> com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
> found when trying to get lock; try restarting transaction
> at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
> at com.mysql.jdbc.Util.getInstance(Util.java:408)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
> at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
> at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
> at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
> at 
> com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
> at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
> at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
> at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
> {code}
> Use "show engine innodb status;" command to find what happens 
> {code:java}
> 2018-05-21 15:41:40 7f4685870700
> *** (1) TRANSACTION:
> TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 
> lock_mode X locks gap before rec insert intention waiting
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info 
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 0ba5f32d; asc -;;
> 2: len 7; hex dd00280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** (2) TRANSACTION:
> TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (2) HOLDS THE LOCK(S):
> RECORD LOCKS

[jira] [Commented] (YARN-8337) Deadlock Federation Router

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483772#comment-16483772
 ] 

genericqa commented on YARN-8337:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 48s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8337 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924514/YARN-8337.001.patch |
| Optional Tests |  asflicense  |
| uname | Linux 27b9a4520364 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 57c2feb |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 475 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn U: 
hadoop-yarn-project/hadoop-yarn |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20819/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Deadlock Federation Router
> --
>
> Key: YARN-8337
> URL: https://issues.apache.org/jira/browse/YARN-8337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Jianchao Jia
>Priority: Major
> Attachments: YARN-8337.001.patch
>
>
> We use mysql innodb as the state store engine,in router log we found dead 
> lock error like below:
> {code:java}
> [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
> Unable to insert the newly generated application 
> application_1526295230627_127402
> com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
> found when trying to get lock; try restarting transaction
> at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
> at com.mysql.jdbc.Util.getInstance(Util.java:408)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
> at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
> at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
> at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
> at 
> com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
> at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
> at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
>

[jira] [Commented] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-22 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483813#comment-16483813
 ] 

Weiwei Yang commented on YARN-8320:
---

Some updates, I am working with [~yangjiandan] on polishing the design doc, 
will add more details and explanations this week. Please feel free to comment 
and share your thoughts.

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8335) Privileged docker containers' jobSubmitDir does not get successfully cleaned up

2018-05-22 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483863#comment-16483863
 ] 

Shane Kumpf commented on YARN-8335:
---

Is this a dupe of YARN-7904?

> Privileged docker containers' jobSubmitDir does not get successfully cleaned 
> up
> ---
>
> Key: YARN-8335
> URL: https://issues.apache.org/jira/browse/YARN-8335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>  Labels: Docker
>
> The jobSubmitDir directory is owned by root and is being cleaned up as the 
> submitting user, which appears to be why it is failing to clean up.
> {noformat}
> 2018-05-21 19:46:15,124 WARN  [DeletionService #0] 
> privileged.PrivilegedOperationExecutor 
> (PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell 
> execution returned exit code: 255. Privileged Execution Operation Stderr:
> Stdout: main : command provided 3
> main : run as user is ebadger
> main : requested yarn user is ebadger
> failed to unlink 
> /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01/jobSubmitDir/job.split:
>  Permission denied
> failed to unlink 
> /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01/jobSubmitDir/job.splitmetainfo:
>  Permission denied
> failed to rmdir jobSubmitDir: Directory not empty
> Error while deleting 
> /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01:
>  39 (Directory not empty)
> Full command array for failed execution:
> [/hadoop-3.2.0-SNAPSHOT/bin/container-executor, ebadger, ebadger, 3, 
> /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01]
> 2018-05-21 19:46:15,124 ERROR [DeletionService #0] 
> nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(848)) - DeleteAsUser for 
> /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:206)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:844)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.FileDeletionTask.run(FileDeletionTask.java:135)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
> at org.apache.hadoop.util.Shell.run(Shell.java:902)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
> ... 10 more
> {noformat}
> {noformat}
> [foo@bar hadoop]$ ls -l 
> /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01/
> total 4
> drwxr-sr-x 2 root users 4096 May 21 19:45 jobSubmitDir
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart

2018-05-22 Thread Gergo Repas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483864#comment-16483864
 ] 

Gergo Repas commented on YARN-8191:
---

[~haibochen] Thanks for the review!
1) - Good point, I fixed it.
2) - This logic's origin is a suggestion from [~wilfreds] (Wilfred - please 
correct me if I'm wrong about the intentions behind {{getRemovedStaticQueues(), 
setQueuesToDynamic()}}). The point here is that the set of removed queues can 
be gathered in {{AllocationReloadListener.onReload()}} outside of the 
writeLock. It's safe to do so because onReload() is only called from the 
synchronized {{AllocationFileLoaderService.reloadAllocations()}} method. This 
way the {{AllocationReloadListener.getRemovedStaticQueues()}} logic is subject 
to the least amount of locking. The thread safety was indeed missing for 
{{QueueManager.setQueuesToDynamic()}}, I've added the missing synchronized 
block.
3) Sorry, what do you mean by "What about the other case where some dynamic 
queues are not added as static in the new allocation file?". If you mean 
dynamic queue creation via application submission, the test case for this (+the 
removal) is {{TestQueueManager.testRemovalOfDynamicLeafQueue()}}.
4-5) I have refactored this part of the code, removed 
getIncompatibleQueueName() and changed only the return type of 
removeEmptyIncompatibleQueues() to indicate if there was no queue that's been 
tried to be removed.
6) {{updateAllocationConfiguration()}} is only called when the configuration 
file has been modified, so if for example there's only one configuration 
modification during the lifetime of the RM, incompatible queues would not be 
cleaned up until a restart.

> Fair scheduler: queue deletion without RM restart
> -
>
> Key: YARN-8191
> URL: https://issues.apache.org/jira/browse/YARN-8191
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: Queue Deletion in Fair Scheduler.pdf, 
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, 
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, 
> YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, 
> YARN-8191.009.patch, YARN-8191.010.patch, YARN-8191.011.patch, 
> YARN-8191.012.patch, YARN-8191.013.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the 
> allocation file, or were dynamically created and are never going to be used 
> again. Queues always remain in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource 
> Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml 
> should be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that 
> point in time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8337) Deadlock Federation Router

2018-05-22 Thread Jianchao Jia (JIRA)

Jianchao Jia created YARN-8337:
--

 Summary: Deadlock Federation Router
 Key: YARN-8337
 URL: https://issues.apache.org/jira/browse/YARN-8337
 Project: Hadoop YARN
  Issue Type: Bug
  Components: federation, router
Reporter: Jianchao Jia


We use mysql innodb as the state store engine,in router log we found dead lock 
error like below:
{code:java}
[2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
Unable to insert the newly generated application 
application_1526295230627_127402
com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
found when trying to get lock; try restarting transaction
at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.Util.getInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
at 
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
at 
com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
at 
com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
at 
com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
at 
com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
at 
com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
at 
com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
at 
com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
at 
org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
{code}
Use "show engine innodb status;" command to find what happens 
{code:java}
2018-05-21 15:41:40 7f4685870700
*** (1) TRANSACTION:
TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 
4999
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 
192.168.1.138 federation executing
INSERT INTO applicationsHomeSubCluster
(applicationId,homeSubCluster)
(SELECT applicationId_IN, homeSubCluster_IN
FROM applicationsHomeSubCluster
WHERE applicationId = applicationId_IN
HAVING COUNT(*) = 0 )
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
`guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
asc application_1526295230627_1274; (total 31 bytes);
1: len 6; hex 0ba5f32d; asc -;;
2: len 7; hex dd00280110; asc ( ;;
3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;

*** (2) TRANSACTION:
TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 
4999
mysql tables in use 2, locked 2
4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 
192.168.1.138 federation executing
INSERT INTO applicationsHomeSubCluster
(applicationId,homeSubCluster)
(SELECT applicationId_IN, homeSubCluster_IN
FROM applicationsHomeSubCluster
WHERE applicationId = applicationId_IN
HAVING COUNT(*) = 0 )
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
`guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 
lock mode S locks gap before rec
Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
asc application_1526295230627_1274; (total 31 bytes);
1: len 6; hex 0ba5f32d; asc -;;
2: len 7; hex dd00280110; asc ( ;;
3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
`guldan_federationstatestore`.`applicationshomesubcluster` trx id

[jira] [Updated] (YARN-8320) Support CPU isolation for latency-sensitive (LS) service

2018-05-22 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8320:
--
Description: 
Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
“cpu.shares” to isolate cpu resource. However,
 * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
no support for differentiated latency
 * Request latency of services running on container may be frequent shake when 
all containers share cpus, and latency-sensitive services can not afford in our 
production environment.

So we need more fine-grained cpu isolation.

Here we propose a solution using cgroup cpuset to binds containers to different 
processors, this is inspired by the isolation technique in [Borg 
system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].

  was:
Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
“cpu.shares” to isolate cpu resource. However,
 * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
no support for differentiated latency
 * Request latency of services running on container may be frequent shake when 
all containers share cpus, and latency-sensitive services can not afford in our 
production environment.

So we need more finer cpu isolation.

My co-workers and I propose a solution using cgroup cpuset to binds containers 
to different processors, this is inspired by the isolation technique in [Borg 
system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
 Later I will upload a detailed design doc.


> Support CPU isolation for latency-sensitive (LS) service
> 
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8320) Support CPU isolation for latency-sensitive (LS) service

2018-05-22 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483813#comment-16483813
 ] 

Weiwei Yang edited comment on YARN-8320 at 5/22/18 11:39 AM:
-

Some updates, I am working with [~yangjiandan] on polishing the design doc, 
will add more details and explanations this week. We've done some prototype 
already as the WIP patch [~yangjiandan] has shared. Please feel free to comment 
and share your thoughts.


was (Author: cheersyang):
Some updates, I am working with [~yangjiandan] on polishing the design doc, 
will add more details and explanations this week. Please feel free to comment 
and share your thoughts.

> Support CPU isolation for latency-sensitive (LS) service
> 
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8329) Docker client configuration can still be set incorrectly

2018-05-22 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483852#comment-16483852
 ] 

Shane Kumpf commented on YARN-8329:
---

Thanks for the review [~jlowe]! 

{quote}I'm not seeing why the copy is necessary. Eliminating the copy would 
also eliminate the need to do a token identifier decode to construct an 
alias.{quote}

Good point. I think the original method to extract all the credentials from the 
token ByteBuffer has value, but not in its current form or location. I'll put 
up a patch to clean this up.

> Docker client configuration can still be set incorrectly
> 
>
> Key: YARN-8329
> URL: https://issues.apache.org/jira/browse/YARN-8329
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8329.001.patch
>
>
> YARN-7996 implemented a fix to avoid writing an empty Docker client 
> configuration file, but there are still cases where the {{docker --config}} 
> argument is set in error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8191) Fair scheduler: queue deletion without RM restart

2018-05-22 Thread Gergo Repas (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergo Repas updated YARN-8191:
--
Attachment: YARN-8191.014.patch

> Fair scheduler: queue deletion without RM restart
> -
>
> Key: YARN-8191
> URL: https://issues.apache.org/jira/browse/YARN-8191
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: Queue Deletion in Fair Scheduler.pdf, 
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, 
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, 
> YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, 
> YARN-8191.009.patch, YARN-8191.010.patch, YARN-8191.011.patch, 
> YARN-8191.012.patch, YARN-8191.013.patch, YARN-8191.014.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the 
> allocation file, or were dynamically created and are never going to be used 
> again. Queues always remain in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource 
> Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml 
> should be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that 
> point in time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded

2018-05-22 Thread Gergo Repas (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergo Repas updated YARN-8273:
--
Attachment: YARN-8273.006.patch

> Log aggregation does not warn if HDFS quota in target directory is exceeded
> ---
>
> Key: YARN-8273
> URL: https://issues.apache.org/jira/browse/YARN-8273
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-8273.000.patch, YARN-8273.001.patch, 
> YARN-8273.002.patch, YARN-8273.003.patch, YARN-8273.004.patch, 
> YARN-8273.005.patch, YARN-8273.006.patch
>
>
> It appears that if an HDFS space quota is set on a target directory for log 
> aggregation and the quota is already exceeded when log aggregation is 
> attempted, zero-byte log files will be written to the HDFS directory, however 
> NodeManager logs do not reflect a failure to write the files successfully 
> (i.e. there are no ERROR or WARN messages to this effect).
> An improvement may be worth investigating to alert users to this scenario, as 
> otherwise logs for a YARN application may be missing both on HDFS and locally 
> (after local log cleanup is done) and the user may not otherwise be informed.
> Steps to reproduce:
> * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB)
> * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full
> * Run a Spark or MR job in the cluster
> * Observe that zero byte files are written to HDFS after job completion
> * Observe that YARN container logs are also not present on the NM hosts (or 
> are deleted after yarn.nodemanager.delete.debug-delay-sec)
> * Observe that no ERROR or WARN messages appear to be logged in the NM role 
> log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()

2018-05-22 Thread Yiran Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8173:
---
Attachment: (was: YARN-8041.004.patch)

> [Router] Implement missing FederationClientInterceptor#getApplications()
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-22 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8259:
--
Priority: Blocker  (was: Major)

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-22 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483845#comment-16483845
 ] 

Shane Kumpf commented on YARN-8259:
---

{quote}System administrator can reserve one cpu core for node manager and all 
the docker inspect call are counted toward saturating one cpu core{quote}
I'm less concerned about the cpu usage and more about docker's client/server 
model and the potential for hangs (that I've seen many of in the past under 
load). Personally, I want the /proc route for my systems and am not using 
hidepid. Losing a container due to an intermittent docker issue isn't really 
acceptable to me when an alternative exists that avoids the issue.

What I could do is implement both the /proc and {{docker inspect}} approaches, 
and a configuration switch to choose the implementation for that that use 
hidepid (or a system without /proc). Would that be acceptable?

I'm also going to make this a blocker, as all privileged containers are leaked 
on NM restart today.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483761#comment-16483761
 ] 

genericqa commented on YARN-8273:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
3s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 48 unchanged - 0 fixed = 49 total (was 48) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
19s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
38s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8273 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924489/YARN-8273.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux dc5dec84fe3e 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 57c2feb |
| maven |

[jira] [Updated] (YARN-8320) upport CPU isolation for latency-sensitive (LS) service

2018-05-22 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8320:
--
Summary: upport CPU isolation for latency-sensitive  (LS) service  (was: 
Add support CPU isolation for latency-sensitive  (LS) service)

> upport CPU isolation for latency-sensitive  (LS) service
> 
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8285) Remove unused environment variables from the Docker runtime

2018-05-22 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483893#comment-16483893
 ] 

Shane Kumpf commented on YARN-8285:
---

[~ebadger], thanks for the patch! +1 lgtm, I'll commit this later today if 
there are no objections. Note that the patch doesn't apply cleanly, but the 
conflict is straightforward enough that I will address it.

> Remove unused environment variables from the Docker runtime
> ---
>
> Key: YARN-8285
> URL: https://issues.apache.org/jira/browse/YARN-8285
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Eric Badger
>Priority: Trivial
>  Labels: Docker
> Attachments: YARN-8285.001.patch
>
>
> YARN-7430 enabled user remapping for Docker containers by default. As a 
> result, YARN_CONTAINER_RUNTIME_DOCKER_RUN_ENABLE_USER_REMAPPING is no longer 
> used and can be removed.
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE_FILE was added in the original 
> implementation, but was never used and can be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484005#comment-16484005
 ] 

genericqa commented on YARN-8273:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
1s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
22s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
49s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 96m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8273 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924535/YARN-8273.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 6b3464e78886 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 57c2feb |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test

[jira] [Updated] (YARN-8337) Deadlock In Federation Router

2018-05-22 Thread Jianchao Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianchao Jia updated YARN-8337:
---
Summary: Deadlock In Federation Router  (was: Deadlock Federation Router)

> Deadlock In Federation Router
> -
>
> Key: YARN-8337
> URL: https://issues.apache.org/jira/browse/YARN-8337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Jianchao Jia
>Priority: Major
> Attachments: YARN-8337.001.patch
>
>
> We use mysql innodb as the state store engine,in router log we found dead 
> lock error like below:
> {code:java}
> [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
> Unable to insert the newly generated application 
> application_1526295230627_127402
> com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
> found when trying to get lock; try restarting transaction
> at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
> at com.mysql.jdbc.Util.getInstance(Util.java:408)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
> at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
> at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
> at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
> at 
> com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
> at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
> at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
> at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
> {code}
> Use "show engine innodb status;" command to find what happens 
> {code:java}
> 2018-05-21 15:41:40 7f4685870700
> *** (1) TRANSACTION:
> TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 
> lock_mode X locks gap before rec insert intention waiting
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info 
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 0ba5f32d; asc -;;
> 2: len 7; hex dd00280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** (2) TRANSACTION:
> TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (2) HOLDS THE LOCK(S):
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 
> lock mode S locks gap before rec
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info

[jira] [Updated] (YARN-8320) Support CPU isolation for latency-sensitive (LS) service

2018-05-22 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8320:
--
Summary: Support CPU isolation for latency-sensitive (LS) service  (was: 
upport CPU isolation for latency-sensitive  (LS) service)

> Support CPU isolation for latency-sensitive (LS) service
> 
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484093#comment-16484093
 ] 

Hudson commented on YARN-8206:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14250 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14250/])
YARN-8206. Sending a kill does not immediately kill docker containers. (jlowe: 
rev 5f11288e41fca2e414dcbea130c7702e29d4d610)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java


> Sending a kill does not immediately kill docker containers
> --
>
> Key: YARN-8206
> URL: https://issues.apache.org/jira/browse/YARN-8206
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8206.001.patch, YARN-8206.002.patch, 
> YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, 
> YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, 
> YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch
>
>
> {noformat}
> if (ContainerExecutor.Signal.KILL.equals(signal)
> || ContainerExecutor.Signal.TERM.equals(signal)) {
>   handleContainerStop(containerId, env);
> {noformat}
> Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent 
> for docker containers. However, they should actually be separate. When YARN 
> sends a SIGKILL to a process, it means for it to die immediately and not sit 
> around waiting for anything. This ensures an immediate reclamation of 
> resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task 
> might not handle the signal correctly, and will then end up as a failed task 
> instead of a killed task. This is especially bad for preemption. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-22 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8259:
--
Target Version/s: 3.0.2, 3.2.0, 3.1.1

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-22 Thread Yiran Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: YARN-8041.005.patch

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch, YARN-8041.005.patch, 
> YARN-8041.006.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-22 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484049#comment-16484049
 ] 

Sunil Govindan commented on YARN-4781:
--

Hi [~eepayne]

Latest patch looks good to me. I tried to test this in a local cluster and 
looks fine.

However i have not verified case where FairOrdering policy could be used with 
weights. Did you get chance to cross check the same as well? Thanks.

Other than this, i  am good with this patch to commit.

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-22 Thread Yiran Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: (was: YARN-8041.005.patch)

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch, YARN-8041.005.patch, 
> YARN-8041.006.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-22 Thread Yiran Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: YARN-8041.006.patch

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch, YARN-8041.005.patch, 
> YARN-8041.006.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484050#comment-16484050
 ] 

genericqa commented on YARN-8191:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 5 new + 88 unchanged - 0 fixed = 93 total (was 88) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 36s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
15s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 30s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}119m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadListener;
 locked 75% of time  Unsynchronized access at 
AllocationFileLoaderService.java:75% of time  Unsynchronized access at 
AllocationFileLoaderService.java:[line 117] |
|  |  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.removeEmptyIncompatibleQueues(String,
 FSQueueType) has Boolean return type and returns explicit null  At 
QueueManager.java:type and returns explicit null  At QueueManager.java:[line 
399] |
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8191 |
| JIRA Patch URL |

[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-22 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484199#comment-16484199
 ] 

Eric Yang commented on YARN-8108:
-

[~yzhangal] This is a regression in Hadoop 3.x, hence it is marked as a 
blocker.  Friendly reminder to PMCs to review this patch to bring this to 
closure.

> RM metrics rest API throws GSSException in kerberized environment
> -
>
> Key: YARN-8108
> URL: https://issues.apache.org/jira/browse/YARN-8108
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kshitij Badani
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-8108.001.patch
>
>
> Test is trying to pull up metrics data from SHS after kiniting as 'test_user'
> It is throwing GSSException as follows
> {code:java}
> b2b460b80713|RUNNING: curl --silent -k -X GET -D 
> /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : 
> http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15
>  07:15:48,757|INFO|MainThread|machine.py:194 - 
> run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0
> 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - 
> getMetricsJsonData()|metrics:
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /proxy/application_1518674952153_0070/metrics/json. 
> Reason:
>  GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {code}
> Rootcausing : proxyserver on RM can't be supported for Kerberos enabled 
> cluster because AuthenticationFilter is applied twice in Hadoop code (once in 
> httpServer2 for RM, and another instance from AmFilterInitializer for proxy 
> server). This will require code changes to hadoop-yarn-server-web-proxy 
> project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens

2018-05-22 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-8232:
-
Fix Version/s: 2.8.5
   2.9.2
   2.10.0

Thanks, [~ziqian hu]!  We recently ran into the same issue on 2.8 as well, so I 
committed this to branch-3.0, branch-2, branch-2.9, and branch-2.8.

> RMContainer lost queue name when RM HA happens
> --
>
> Key: YARN-8232
> URL: https://issues.apache.org/jira/browse/YARN-8232
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 2.8.5
>
> Attachments: YARN-8232-branch-2.8.3.001.patch, YARN-8232.001.patch, 
> YARN-8232.002.patch, YARN-8232.003.patch
>
>
> RMContainer has a member variable queuename to store which queue the 
> container belongs to. When RM HA happens and RMContainers are recovered by 
> scheduler based on NM reports, the queue name isn't recovered and always be 
> null.
> This situation causes some problems. Here is a case in preemption. Preemption 
> uses container's queue name to deduct preemptable resources when we use more 
> than one preempt selector, (for example, enable intra-queue preemption,) . 
> The detail is in
> {code:java}
> CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code}
> If the contain's queue name is null, this function will throw a 
> YarnRuntimeException because it tries to get the container's 
> TempQueuePerPartition and the preemption fails.
> Our patch solved this problem by setting container queue name when recover 
> containers. The patch is based on branch-2.8.3.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-22 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484249#comment-16484249
 ] 

Vinod Kumar Vavilapalli commented on YARN-8338:
---

Full exception trace
{code:java}
java.lang.NoClassDefFoundError: org/objenesis/Objenesis
    at 
org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2532)
    at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2497)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2593)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2619)
    at 
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.createSummaryStore(EntityGroupFSTimelineStore.java:266)
    at 
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.serviceInit(EntityGroupFSTimelineStore.java:152)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
    at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:115)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:177)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:187)
Caused by: java.lang.ClassNotFoundException: org.objenesis.Objenesis
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 15 more{code}

> TimelineService V1.5 doesn't come up after HADOOP-15406
> ---
>
> Key: YARN-8338
> URL: https://issues.apache.org/jira/browse/YARN-8338
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
>
> TimelineService V1.5 fails with the following:
> {code}
> java.lang.NoClassDefFoundError: org/objenesis/Objenesis
>   at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-22 Thread Yiran Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: YARN-8041.007.patch

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch, YARN-8041.005.patch, 
> YARN-8041.006.patch, YARN-8041.007.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8337) Deadlock In Federation Router

2018-05-22 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484286#comment-16484286
 ] 

Giovanni Matteo Fumarola commented on YARN-8337:


Thanks [~jianchao jia] and [~yiran] for finding the bug and work on it.

The same logic works in *HSQLDBFederationStateStore*. Please update this test 
as well and test if the fix works over there.

 

> Deadlock In Federation Router
> -
>
> Key: YARN-8337
> URL: https://issues.apache.org/jira/browse/YARN-8337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Jianchao Jia
>Priority: Major
> Attachments: YARN-8337.001.patch
>
>
> We use mysql innodb as the state store engine,in router log we found dead 
> lock error like below:
> {code:java}
> [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
> Unable to insert the newly generated application 
> application_1526295230627_127402
> com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
> found when trying to get lock; try restarting transaction
> at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
> at com.mysql.jdbc.Util.getInstance(Util.java:408)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
> at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
> at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
> at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
> at 
> com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
> at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
> at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
> at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
> {code}
> Use "show engine innodb status;" command to find what happens 
> {code:java}
> 2018-05-21 15:41:40 7f4685870700
> *** (1) TRANSACTION:
> TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 
> lock_mode X locks gap before rec insert intention waiting
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info 
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 0ba5f32d; asc -;;
> 2: len 7; hex dd00280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** (2) TRANSACTION:
> TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (2) HOLDS THE LOCK(S):
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
>

[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-22 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484295#comment-16484295
 ] 

Giovanni Matteo Fumarola commented on YARN-8041:


[~yiran] thanks for the patch.

The latest patch has some problems: checkstyle, whitespace and most important a 
failed unit test in Router.
Please fix those.

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch, YARN-8041.005.patch, 
> YARN-8041.006.patch, YARN-8041.007.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484238#comment-16484238
 ] 

genericqa commented on YARN-8041:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 46s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 10 new 
+ 18 unchanged - 0 fixed = 28 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
13s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 
38s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 19s{color} 
| {color:red} hadoop-yarn-server-router in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}135m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.router.webapp.TestFederationInterceptorREST |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8041 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924556/YARN-8041.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 094db537de3d

[jira] [Created] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-22 Thread Vinod Kumar Vavilapalli (JIRA)

Vinod Kumar Vavilapalli created YARN-8338:
-

 Summary: TimelineService V1.5 doesn't come up after HADOOP-15406
 Key: YARN-8338
 URL: https://issues.apache.org/jira/browse/YARN-8338
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


TimelineService V1.5 fails with the following:

{code}
java.lang.NoClassDefFoundError: org/objenesis/Objenesis
at 
org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484257#comment-16484257
 ] 

Eric Payne commented on YARN-8292:
--

{quote}Actually this is required after the change.
{quote}
Yes, I see now.
{quote}TestPreemptionForQueueWithPriorities
{quote}
{{TestPreemptionForQueueWithPriorities}} passes for me in my local environment.
{quote}doPreempt = Resources.lessThan(rc, clusterResource,
 Resources
 .componentwiseMin(toObtainAfterPreemption, Resources.none()),
 Resources.componentwiseMin(toObtainByPartition, Resources.none()));
{quote}
I don't think we want the above code to {{componentwiseMin}} the {{toObtain}} 
values with 0, since that will set _all_ positive resource entities to 0.
{quote}Can we address this in a separate JIRA if we cannot come with some 
simple solution?
{quote}
In my tests, the current implementation of preemption does not seem to work 
anyway when extensible resources are enabled, so this seems to be a larger 
problem. You are right that it should be its own JIRA.

I give my +1 here. [~jlowe] / [~sunilg], do you have additional comments?

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-22 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484271#comment-16484271
 ] 

Eric Yang commented on YARN-8259:
-

[~shaneku...@gmail.com] The proposal for implementing both is okay, but we can 
make better software with sensible optimization and pick a solution that can 
work for all scenarios without adding extra administration tasks.  There is no 
objection with current approach.  We are aware that hidepid corner case can 
generate additional system administration tasks to white list node manager to 
access all pid.  We also know it cost more resource to fork exec with docker 
inspect approach.  Human labor to configure OS with knowledge of Hadoop details 
is usually more expensive than adding processor or ram.  It would be great if 
the solution can work without additional configuration flag, nor adding extra 
hardware resource.  This means doing pid check as privileged user via 
container-executor may be preferred solution by system administrators without 
adding overhead to system administration chores.  Can proc pid check work in 
docker in docker environment?

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486511#comment-16486511
 ] 

genericqa commented on YARN-8310:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
47s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
27s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
56s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8310 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924026/YARN-8310.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c552e7cde900 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 43be9ab |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20828/testReport/ |
| Max. process+thread count | 1348 (vs. ulimit of 1) |
| modules | C:

[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

2018-05-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486637#comment-16486637
 ] 

Miklos Szegedi commented on YARN-8310:
--

I will backport this to branches branch-2, branch-3.0 and branch 3.1

> Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
> ContainerTokenIdentifier formats
> ---
>
> Key: YARN-8310
> URL: https://issues.apache.org/jira/browse/YARN-8310
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Major
> Attachments: YARN-8310.001.patch, YARN-8310.002.patch, 
> YARN-8310.003.patch, YARN-8310.branch-2.001.patch, 
> YARN-8310.branch-2.002.patch, YARN-8310.branch-2.003.patch
>
>
> In some recent upgrade testing, we saw this error causing the NodeManager to 
> fail to startup afterwards:
> {noformat}
> org.apache.hadoop.service.ServiceStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> contained an invalid tag (zero).
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message contained an invalid tag (zero).
>   at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
>   at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
>   at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
>   at 
> org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
>   at 
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
>   at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 5 more
> {noformat}
> The NodeManager fails because it's trying to read a 
> {{ContainerTokenIdentifier}} in the "old" format before we changed them to 
> protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
> similar problem with the ResourceManager and RM Delegation Tokens.
> To provide a better experience, we should make the code able to read the old 
> format if it's unable to read it using the new format.  We didn't run into 
> any errors with the other two types of tokens that YARN-668 incompatibly 
> changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix 
> those while we're at it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands,

[jira] [Commented] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-05-22 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486638#comment-16486638
 ] 

Wilfred Spiegelenburg commented on YARN-7998:
-

I don't think we should fail restore of a running application at all when the 
ACL was changed. Logging the failure is good but just killing the application 
is not the right thing to do. We should either not start up at all and tell the 
end user to fix the configuration or allow the application to be restored and 
finish. The ACL change when made on a running RM is also not triggering a 
running application review.  You do not kill any running application that is 
not allowed by the ACL when it gets changed. Restore should not behave any 
different.

Based on the details in YARN-7913 I think we need to close this as a duplicate 
and come up with a general fix that handles all these cases and not do one of 
changes to fix a specific corner case.

> RM crashes with NPE during recovering if ACL configuration was changed
> --
>
> Key: YARN-7998
> URL: https://issues.apache.org/jira/browse/YARN-7998
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Major
> Attachments: YARN-7998.000.patch, YARN-7998.001.patch, 
> YARN-7998.002.patch, YARN-7998.003.patch
>
>
> RM crashes with NPE during failover because ACL configurations were changed 
> as a result we no longer have a rights to submit an application to a queue.
> Scenario:
>  # Submit an application
>  # Change ACL configuration for a queue that accepted the application so that 
> an owner of the application will no longer have a rights to submit this 
> application.
>  # Restart RM.
> As a result, we get NPE:
> 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-22 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486678#comment-16486678
 ] 

Vinod Kumar Vavilapalli commented on YARN-8338:
---

[~jlowe], never mind, removing the exclusion failed in compilation itself. We 
will have to declare a version.

{code}
[INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (depcheck) @ hadoop-aws ---
[WARNING]
Dependency convergence error for org.objenesis:objenesis:2.1 paths to 
dependency are:
+-org.apache.hadoop:hadoop-aws:3.1.1-SNAPSHOT
  +-com.amazonaws:DynamoDBLocal:1.11.86
+-org.mockito:mockito-core:1.10.19
  +-org.objenesis:objenesis:2.1
and
+-org.apache.hadoop:hadoop-aws:3.1.1-SNAPSHOT
  +-org.apache.hadoop:hadoop-yarn-server-tests:3.1.1-SNAPSHOT
+-org.apache.hadoop:hadoop-yarn-server-resourcemanager:3.1.1-SNAPSHOT
  
+-org.apache.hadoop:hadoop-yarn-server-applicationhistoryservice:3.1.1-SNAPSHOT
+-de.ruedigermoeller:fst:2.50
  +-org.objenesis:objenesis:2.5.1

[WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
failed with message:
Failed while enforcing releasability. See above detailed error message.
{code}

> TimelineService V1.5 doesn't come up after HADOOP-15406
> ---
>
> Key: YARN-8338
> URL: https://issues.apache.org/jira/browse/YARN-8338
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: YARN-8338.txt
>
>
> TimelineService V1.5 fails with the following:
> {code}
> java.lang.NoClassDefFoundError: org/objenesis/Objenesis
>   at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7899) [AMRMProxy] Stateful FederationInterceptor for pending requests

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486508#comment-16486508
 ] 

genericqa commented on YARN-7899:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 13 new + 16 unchanged - 0 fixed = 29 total (was 16) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
19s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
21s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
37s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}114m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-7899 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924637/YARN-7899.v1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d8722a246fd9 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 43be9ab |
|

[jira] [Updated] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-22 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8336:
---
Attachment: YARN-8336.v2.patch

> Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
> -
>
> Key: YARN-8336
> URL: https://issues.apache.org/jira/browse/YARN-8336
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-22 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486530#comment-16486530
 ] 

Eric Yang edited comment on YARN-8342 at 5/23/18 12:58 AM:
---

The current behavior is documented in 
[YARN-7516|https://issues.apache.org/jira/browse/YARN-7516?focusedCommentId=16353125=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16353125].
  Non-trusted image is not allowed to supply launch command into container due 
to 
[reason|https://issues.apache.org/jira/browse/YARN-7516?focusedCommentId=16347441=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347441]
 stated by Shane.  We don't allow mounting of host disks to untrusted image to 
prevent the image from putting unauthorized files that can not be erased in the 
localizer directory.  When using untrusted image with yarn mode, this will 
generate a launch_container.sh that runs a empty bash command and exit 
immediately according to Shane.  The end result is some what unexpected even 
though it minimized the security risks. 

The solution is to set YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true 
in yarn-env.sh, and this will turn the cluster into docker mode as default.  
There is no launch_container.sh required in docker mode, and we might be able 
to lift drop launch command restriction.


was (Author: eyang):
The current behavior is documented in 
[YARN-7516|https://issues.apache.org/jira/browse/YARN-7516?focusedCommentId=16353125=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16353125].
  Non-trusted image is not allowed to supply launch command into container due 
to 
[reason|https://issues.apache.org/jira/browse/YARN-7516?focusedCommentId=16353125=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16353125]
 stated by Shane.  We don't allow mounting of host disks to untrusted image to 
prevent the image from putting unauthorized files that can not be erased in the 
localizer directory.  When using untrusted image with yarn mode, this will 
generate a launch_container.sh that runs a empty bash command and exit 
immediately according to Shane.  The end result is some what unexpected even 
though it minimized the security risks. 

The solution is to set YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true 
in yarn-env.sh, and this will turn the cluster into docker mode as default.  
There is no launch_container.sh required in docker mode, and we might be able 
to lift drop launch command restriction.

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Priority: Critical
>  Labels: Docker
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix unit tests on Windows

2018-05-22 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8344:
---
Summary: Missing nm.close() in TestNodeManagerResync to fix unit tests on 
Windows  (was: Missing nm.close() in TestNodeManagerResync)

> Missing nm.close() in TestNodeManagerResync to fix unit tests on Windows
> 
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix unit tests on Windows

2018-05-22 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486577#comment-16486577
 ] 

Giovanni Matteo Fumarola commented on YARN-8344:


Missing nm.close let other unit tests to fail.

@SuppressWarnings("unchecked") are not anymore valid.

> Missing nm.close() in TestNodeManagerResync to fix unit tests on Windows
> 
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486607#comment-16486607
 ] 

genericqa commented on YARN-8336:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
19s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 28m 
21s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8336 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924656/YARN-8336.v2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5d892f15fcfe 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 68c7fd8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results |

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486618#comment-16486618
 ] 

genericqa commented on YARN-8292:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
9s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 11s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 7 new + 98 unchanged - 0 fixed = 105 total (was 98) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
27s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 57s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
|   | 
hadoop.yarn.server.resourcemanager.monitor.capacity.TestPreemptionForQueueWithPriorities
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8292 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924650/YARN-8292.007.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 86a2cc16e4d0 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 
21:23:04 UTC 2018 x86_64 x86_64 x86_64

[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-22 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486530#comment-16486530
 ] 

Eric Yang commented on YARN-8342:
-

The current behavior is documented in 
[YARN-7516|https://issues.apache.org/jira/browse/YARN-7516?focusedCommentId=16353125=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16353125].
  Non-trusted image is not allowed to supply launch command into container due 
to 
[reason|https://issues.apache.org/jira/browse/YARN-7516?focusedCommentId=16353125=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16353125]
 stated by Shane.  We don't allow mounting of host disks to untrusted image to 
prevent the image from putting unauthorized files that can not be erased in the 
localizer directory.  When using untrusted image with yarn mode, this will 
generate a launch_container.sh that runs a empty bash command and exit 
immediately according to Shane.  The end result is some what unexpected even 
though it minimized the security risks. 

The solution is to set YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true 
in yarn-env.sh, and this will turn the cluster into docker mode as default.  
There is no launch_container.sh required in docker mode, and we might be able 
to lift drop launch command restriction.

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Priority: Critical
>  Labels: Docker
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8344) Missing nm.close() in TestNodeManagerResync

2018-05-22 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8344:
---
Attachment: YARN-8344.v1.patch

> Missing nm.close() in TestNodeManagerResync
> ---
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8344) Missing nm.close() in TestNodeManagerResync

2018-05-22 Thread Giovanni Matteo Fumarola (JIRA)

Giovanni Matteo Fumarola created YARN-8344:
--

 Summary: Missing nm.close() in TestNodeManagerResync
 Key: YARN-8344
 URL: https://issues.apache.org/jira/browse/YARN-8344
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Giovanni Matteo Fumarola
Assignee: Giovanni Matteo Fumarola






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

2018-05-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486644#comment-16486644
 ] 

Hudson commented on YARN-8310:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14258 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14258/])
YARN-8310. Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
(miklos.szegedi: rev 3e5f7ea986600e084fcac723b0423e7de1b3bb8a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java


> Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
> ContainerTokenIdentifier formats
> ---
>
> Key: YARN-8310
> URL: https://issues.apache.org/jira/browse/YARN-8310
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Major
> Attachments: YARN-8310.001.patch, YARN-8310.002.patch, 
> YARN-8310.003.patch, YARN-8310.branch-2.001.patch, 
> YARN-8310.branch-2.002.patch, YARN-8310.branch-2.003.patch
>
>
> In some recent upgrade testing, we saw this error causing the NodeManager to 
> fail to startup afterwards:
> {noformat}
> org.apache.hadoop.service.ServiceStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> contained an invalid tag (zero).
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message contained an invalid tag (zero).
>   at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
>   at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
>   at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
>   at 
> org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
>   at 
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
>   at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 5 more
> {noformat}
> The NodeManager fails because it's trying

[jira] [Commented] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486643#comment-16486643
 ] 

genericqa commented on YARN-7998:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-7998 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7998 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913213/YARN-7998.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20835/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> RM crashes with NPE during recovering if ACL configuration was changed
> --
>
> Key: YARN-7998
> URL: https://issues.apache.org/jira/browse/YARN-7998
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Major
> Attachments: YARN-7998.000.patch, YARN-7998.001.patch, 
> YARN-7998.002.patch, YARN-7998.003.patch
>
>
> RM crashes with NPE during failover because ACL configurations were changed 
> as a result we no longer have a rights to submit an application to a queue.
> Scenario:
>  # Submit an application
>  # Change ACL configuration for a queue that accepted the application so that 
> an owner of the application will no longer have a rights to submit this 
> application.
>  # Restart RM.
> As a result, we get NPE:
> 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7340) Missing the time stamp in exception message in Class NoOverCommitPolicy

2018-05-22 Thread Dinesh Chitlangia (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486713#comment-16486713
 ] 

Dinesh Chitlangia commented on YARN-7340:
-

[~yufeigu] - I have updated the log message to reflect the startTime. Kindly 
review the patch. I have not updated any unit tests for this.

Thank you.

> Missing the time stamp in exception message in Class NoOverCommitPolicy
> ---
>
> Key: YARN-7340
> URL: https://issues.apache.org/jira/browse/YARN-7340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: reservation system
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Dinesh Chitlangia
>Priority: Minor
>  Labels: newbie++
> Attachments: YARN-7340.001.patch
>
>
> It could be easily figured out by reading code.
> {code}
>   throw new ResourceOverCommitException(
>   "Resources at time " + " would be overcommitted by "
>   + "accepting reservation: " + reservation.getReservationId());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-05-22 Thread Hsin-Liang Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484721#comment-16484721
 ] 

Hsin-Liang Huang edited comment on YARN-8326 at 5/23/18 12:18 AM:
--

[~eyang]   this afternoon,  I tried the command and the performance was 
dramatically improved.  It used to run 8 seconds, now it ran 3 seconds 
consistently, then I compared with the other 3.0 cluster which I didn't make 
the properties changes that you suggested, and it still ran 8 seconds 
consistently.   I am going to run our testcases to see if the performance is 
also improved there. 


was (Author: hlhu...@us.ibm.com):
[~eyang]   this afternoon,  I tried the command and the performance was 
dramatically improved.  It used to run 8 seconds, now it ran 3 seconds 
consistently, then I compared with the other HDP 3.0 cluster which I didn't 
make the properties changes that you suggested, and it still ran 8 seconds 
consistently.   I am going to run our testcases to see if the performance is 
also improved there. 

> Yarn 3.0 seems runs slower than Yarn 2.6
> 
>
> Key: YARN-8326
> URL: https://issues.apache.org/jira/browse/YARN-8326
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
> Environment: This is the yarn-site.xml for 3.0. 
>  
> 
> 
>  hadoop.registry.dns.bind-port
>  5353
>  
> 
>  hadoop.registry.dns.domain-name
>  hwx.site
>  
> 
>  hadoop.registry.dns.enabled
>  true
>  
> 
>  hadoop.registry.dns.zone-mask
>  255.255.255.0
>  
> 
>  hadoop.registry.dns.zone-subnet
>  172.17.0.0
>  
> 
>  manage.include.files
>  false
>  
> 
>  yarn.acl.enable
>  false
>  
> 
>  yarn.admin.acl
>  yarn
>  
> 
>  yarn.client.nodemanager-connect.max-wait-ms
>  6
>  
> 
>  yarn.client.nodemanager-connect.retry-interval-ms
>  1
>  
> 
>  yarn.http.policy
>  HTTP_ONLY
>  
> 
>  yarn.log-aggregation-enable
>  false
>  
> 
>  yarn.log-aggregation.retain-seconds
>  2592000
>  
> 
>  yarn.log.server.url
>  
> [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs]
>  
> 
>  yarn.log.server.web-service.url
>  
> [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory]
>  
> 
>  yarn.node-labels.enabled
>  false
>  
> 
>  yarn.node-labels.fs-store.retry-policy-spec
>  2000, 500
>  
> 
>  yarn.node-labels.fs-store.root-dir
>  /system/yarn/node-labels
>  
> 
>  yarn.nodemanager.address
>  0.0.0.0:45454
>  
> 
>  yarn.nodemanager.admin-env
>  MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX
>  
> 
>  yarn.nodemanager.aux-services
>  mapreduce_shuffle,spark2_shuffle,timeline_collector
>  
> 
>  yarn.nodemanager.aux-services.mapreduce_shuffle.class
>  org.apache.hadoop.mapred.ShuffleHandler
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.classpath
>  /usr/spark2/aux/*
>  
> 
>  yarn.nodemanager.aux-services.spark_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.timeline_collector.class
>  
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService
>  
> 
>  yarn.nodemanager.bind-host
>  0.0.0.0
>  
> 
>  yarn.nodemanager.container-executor.class
>  
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
>  
> 
>  yarn.nodemanager.container-metrics.unregister-delay-ms
>  6
>  
> 
>  yarn.nodemanager.container-monitor.interval-ms
>  3000
>  
> 
>  yarn.nodemanager.delete.debug-delay-sec
>  0
>  
> 
>  
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
>  90
>  
> 
>  yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
>  1000
>  
> 
>  yarn.nodemanager.disk-health-checker.min-healthy-disks
>  0.25
>  
> 
>  yarn.nodemanager.health-checker.interval-ms
>  135000
>  
> 
>  yarn.nodemanager.health-checker.script.timeout-ms
>  6
>  
> 
>  
> yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage
>  false
>  
> 
>  yarn.nodemanager.linux-container-executor.group
>  hadoop
>  
> 
>  
> yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users
>  false
>  
> 
>  yarn.nodemanager.local-dirs
>  /hadoop/yarn/local
>  
> 
>  yarn.nodemanager.log-aggregation.compression-type
>  gz
>  
> 
>  yarn.nodemanager.log-aggregation.debug-enabled
>  false
>  
> 
>  yarn.nodemanager.log-aggregation.num-log-files-per-app
>  30
>  
> 
>  
> yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
>  3600
>  
> 
>  yarn.nodemanager.log-dirs
>  /hadoop/yarn/log
>  
> 
>  yarn.nodemanager.log.retain-seconds
>  604800
>  
> 
>  yarn.nodemanager.pmem-check-enabled
>  false
>  
> 
>  yarn.nodemanager.recovery.dir
>

[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-22 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486545#comment-16486545
 ] 

Vinod Kumar Vavilapalli commented on YARN-8342:
---

Looks like the name {{docker.privileged-containers.registries}} is very 
misleading. It doesn't apply only for Docker Privileged Containers, right? If 
so, we should fix this name.

bq. When using untrusted image with yarn mode, this will generate a 
launch_container.sh that runs a empty bash command and exit immediately 
according to Shane
Why not simply take the launch command given by the user and let it fail 
instead of silently replacing it with empty bash?

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Priority: Critical
>  Labels: Docker
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486590#comment-16486590
 ] 

genericqa commented on YARN-8334:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 25m 
56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} YARN-7402 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
10s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} YARN-7402 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
33s{color} | {color:green} hadoop-yarn-server-globalpolicygenerator in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 83m 29s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-8334 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924657/YARN-8334-YARN-7402.v2.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 16db2c1ede7d 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | YARN-7402 / f9c69ca |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20833/testReport/ |
| Max. process+thread count | 304 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator
 |
| Console output |

[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups

2018-05-22 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.016.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.015.patch, 
> YARN-4599.016.patch, YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-22 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8334:
---
Attachment: YARN-8334-YARN-7402.v2.patch

> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch, 
> YARN-8334-YARN-7402.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

2018-05-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486628#comment-16486628
 ] 

Miklos Szegedi commented on YARN-8310:
--

Committed to trunk. Thank you for the patch [~rkanter] and for the review 
[~grepas].

> Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
> ContainerTokenIdentifier formats
> ---
>
> Key: YARN-8310
> URL: https://issues.apache.org/jira/browse/YARN-8310
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Major
> Attachments: YARN-8310.001.patch, YARN-8310.002.patch, 
> YARN-8310.003.patch, YARN-8310.branch-2.001.patch, 
> YARN-8310.branch-2.002.patch, YARN-8310.branch-2.003.patch
>
>
> In some recent upgrade testing, we saw this error causing the NodeManager to 
> fail to startup afterwards:
> {noformat}
> org.apache.hadoop.service.ServiceStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> contained an invalid tag (zero).
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message contained an invalid tag (zero).
>   at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
>   at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
>   at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
>   at 
> org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
>   at 
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
>   at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 5 more
> {noformat}
> The NodeManager fails because it's trying to read a 
> {{ContainerTokenIdentifier}} in the "old" format before we changed them to 
> protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
> similar problem with the ResourceManager and RM Delegation Tokens.
> To provide a better experience, we should make the code able to read the old 
> format if it's unable to read it using the new format.  We didn't run into 
> any errors with the other two types of tokens that YARN-668 incompatibly 
> changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix 
> those while we're at it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For

[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

2018-05-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486626#comment-16486626
 ] 

Miklos Szegedi commented on YARN-8310:
--

+1 LGTM.

> Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
> ContainerTokenIdentifier formats
> ---
>
> Key: YARN-8310
> URL: https://issues.apache.org/jira/browse/YARN-8310
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Major
> Attachments: YARN-8310.001.patch, YARN-8310.002.patch, 
> YARN-8310.003.patch, YARN-8310.branch-2.001.patch, 
> YARN-8310.branch-2.002.patch, YARN-8310.branch-2.003.patch
>
>
> In some recent upgrade testing, we saw this error causing the NodeManager to 
> fail to startup afterwards:
> {noformat}
> org.apache.hadoop.service.ServiceStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> contained an invalid tag (zero).
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message contained an invalid tag (zero).
>   at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
>   at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
>   at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
>   at 
> org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
>   at 
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
>   at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 5 more
> {noformat}
> The NodeManager fails because it's trying to read a 
> {{ContainerTokenIdentifier}} in the "old" format before we changed them to 
> protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
> similar problem with the ResourceManager and RM Delegation Tokens.
> To provide a better experience, we should make the code able to read the old 
> format if it's unable to read it using the new format.  We didn't run into 
> any errors with the other two types of tokens that YARN-668 incompatibly 
> changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix 
> those while we're at it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix unit tests on Windows

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486641#comment-16486641
 ] 

genericqa commented on YARN-8344:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 24s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 29 unchanged - 2 fixed = 30 total (was 31) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m  
4s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8344 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924672/YARN-8344.v1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1785a9e0357b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 68c7fd8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20834/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20834/testReport/ |
| Max. process+thread count | 341 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U:

[jira] [Created] (YARN-8341) Yarn Service: Integration tests

2018-05-22 Thread Chandni Singh (JIRA)

Chandni Singh created YARN-8341:
---

 Summary: Yarn Service: Integration tests 
 Key: YARN-8341
 URL: https://issues.apache.org/jira/browse/YARN-8341
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Chandni Singh
Assignee: Chandni Singh


In order to test the rest api end-to-end, we can add Integration tests for Yarn 
service api. 
The integration tests 
* belong to junit category {{IntegrationTest}}.
* will be only run when triggered by executing {{mvn failsafe:integration-test}}
* the surefire plugin for regular tests excludes {{IntegrationTest}}
* RM host, user name, and any additional properties which are needed to execute 
the tests against a cluster can be passed as System properties.
For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}

We can add more integration tests which can check scalability and performance.
Have these tests here benefits everyone in the community because anyone can run 
these tests against there cluster. 

Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8341) Yarn Service: Integration tests

2018-05-22 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8341:

Attachment: (was: YARN-8341.wip.patch)

> Yarn Service: Integration tests 
> 
>
> Key: YARN-8341
> URL: https://issues.apache.org/jira/browse/YARN-8341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8341.wip.patch
>
>
> In order to test the rest api end-to-end, we can add Integration tests for 
> Yarn service api. 
> The integration tests 
> * belong to junit category {{IntegrationTest}}.
> * will be only run when triggered by executing {{mvn 
> failsafe:integration-test}}
> * the surefire plugin for regular tests excludes {{IntegrationTest}}
> * RM host, user name, and any additional properties which are needed to 
> execute the tests against a cluster can be passed as System properties.
> For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}
> We can add more integration tests which can check scalability and performance.
> Have these tests here benefits everyone in the community because anyone can 
> run these tests against there cluster. 
> Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8290) SystemMetricsPublisher.appACLsUpdated should be invoked after application information is published to ATS to avoid "User is not set in the application report" Exception

2018-05-22 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8290:
-
Summary: SystemMetricsPublisher.appACLsUpdated should be invoked after 
application information is published to ATS to avoid "User is not set in the 
application report" Exception  (was: SystemMetricsPublisher.appACLsUpdated 
should be invoked after application information is published to ATS to avoid )

> SystemMetricsPublisher.appACLsUpdated should be invoked after application 
> information is published to ATS to avoid "User is not set in the application 
> report" Exception
> 
>
> Key: YARN-8290
> URL: https://issues.apache.org/jira/browse/YARN-8290
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8290.001.patch, YARN-8290.002.patch, 
> YARN-8290.003.patch, YARN-8290.004.patch
>
>
> Scenario:
> 1) Start 5 streaming application in background
> 2) Kill Active RM and cause RM failover
> After RM failover, The application failed with below error.
> {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception on [rm2] : 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1517520038847_0003' doesn't exist in RM. Please check 
> that the job submission was successful.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> , so propagating back to caller.
> 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application 
> application_1517520038847_0003
> 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1517520038847_0003
> 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is 
> not set in the application report
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8290) SystemMetricsPublisher.appACLsUpdated should be invoked after application information is published to ATS to avoid "User is not set in the application report" Exception

2018-05-22 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8290:
-
Priority: Critical  (was: Major)

> SystemMetricsPublisher.appACLsUpdated should be invoked after application 
> information is published to ATS to avoid "User is not set in the application 
> report" Exception
> 
>
> Key: YARN-8290
> URL: https://issues.apache.org/jira/browse/YARN-8290
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Critical
> Attachments: YARN-8290.001.patch, YARN-8290.002.patch, 
> YARN-8290.003.patch, YARN-8290.004.patch
>
>
> Scenario:
> 1) Start 5 streaming application in background
> 2) Kill Active RM and cause RM failover
> After RM failover, The application failed with below error.
> {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception on [rm2] : 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1517520038847_0003' doesn't exist in RM. Please check 
> that the job submission was successful.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> , so propagating back to caller.
> 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application 
> application_1517520038847_0003
> 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1517520038847_0003
> 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is 
> not set in the application report
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8341) Yarn Service: Integration tests

2018-05-22 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8341:

Attachment: YARN-8341.wip.patch

> Yarn Service: Integration tests 
> 
>
> Key: YARN-8341
> URL: https://issues.apache.org/jira/browse/YARN-8341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8341.wip.patch, YARN-8341.wip.patch
>
>
> In order to test the rest api end-to-end, we can add Integration tests for 
> Yarn service api. 
> The integration tests 
> * belong to junit category {{IntegrationTest}}.
> * will be only run when triggered by executing {{mvn 
> failsafe:integration-test}}
> * the surefire plugin for regular tests excludes {{IntegrationTest}}
> * RM host, user name, and any additional properties which are needed to 
> execute the tests against a cluster can be passed as System properties.
> For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}
> We can add more integration tests which can check scalability and performance.
> Have these tests here benefits everyone in the community because anyone can 
> run these tests against there cluster. 
> Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded

2018-05-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484638#comment-16484638
 ] 

Hudson commented on YARN-8273:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14255 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14255/])
YARN-8273. Log aggregation does not warn if HDFS quota in target (rkanter: rev 
b22f56c4719e63bd4f6edc2a075e0bcdb9442255)
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationDFSException.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestAppLogAggregatorImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/LogAggregationTFileController.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestContainerLogsUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java


> Log aggregation does not warn if HDFS quota in target directory is exceeded
> ---
>
> Key: YARN-8273
> URL: https://issues.apache.org/jira/browse/YARN-8273
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8273.000.patch, YARN-8273.001.patch, 
> YARN-8273.002.patch, YARN-8273.003.patch, YARN-8273.004.patch, 
> YARN-8273.005.patch, YARN-8273.006.patch
>
>
> It appears that if an HDFS space quota is set on a target directory for log 
> aggregation and the quota is already exceeded when log aggregation is 
> attempted, zero-byte log files will be written to the HDFS directory, however 
> NodeManager logs do not reflect a failure to write the files successfully 
> (i.e. there are no ERROR or WARN messages to this effect).
> An improvement may be worth investigating to alert users to this scenario, as 
> otherwise logs for a YARN application may be missing both on HDFS and locally 
> (after local log cleanup is done) and the user may not otherwise be informed.
> Steps to reproduce:
> * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB)
> * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full
> * Run a Spark or MR job in the cluster
> * Observe that zero byte files are written to HDFS after job completion
> * Observe that YARN container logs are also not present on the NM hosts (or 
> are deleted after yarn.nodemanager.delete.debug-delay-sec)
> * Observe that no ERROR or WARN messages appear to be logged in the NM role 
> log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-22 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484651#comment-16484651
 ] 

Botong Huang edited comment on YARN-8334 at 5/22/18 10:03 PM:
--

I realized that I confused destroy() with finalize() earlier. +1 on the patch 
pending on the findbug warning. You can basically remove the if (client != 
null) check. 


was (Author: botong):
I realized that I confused destroy() with finalize() earlier. +1 on the patch 
pending on the findbug warning. 

> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-22 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484674#comment-16484674
 ] 

Jason Lowe commented on YARN-8292:
--

Thanks for updating the patch!

Why does isAnyMajorResourceZeroOrNegative explicitly use a floating point zero 
constant and force the implicit conversion of the getMemorySize() result from a 
long to a float?  This is done in a few other places in 
DefaultResourceCalculator and they all seem like wasteful conversions to me.

The logger that was added to AbstractPreemptableResourceCalculator is not used. 
 Also I'm curious why commons logging was used here instead of SLF4J.

stepFactor is a constant that should be precomputed in the 
AbstractPreemptableResourceCalculator constructor rather than computing it from 
scratch each time.

Do we really want to use Resources.lessThanOrEqual(rc, totGuarant, unassigned, 
none) here?  For DRF that requires computing shares in each resource dimension 
for both resources which is relatively expensive.  I think 
Resources.fitsIn(unassigned, none) is more along what what is called for here 
(although fitsIn does some unit checking and conversions we don't want either). 
 Really what we want is something like a isAnyMajorResourceRequested() which 
returns true if any resource dimension is > 0.  Not a fan of the proposed 
method name, but hopefully it gets across what I'm talking about here.  Of 
course if we're going to always componentwiseMax unassigned with 
Resources.none() to make sure no resource dimension in unassigned can ever go 
negative then the check can be simplified to if 
(Resources.none().equals(unassigned)).

Similar "do we really want a full DRF comparison here" comment for the 
Resources.greaterThan(rc, clusterResource, toObtainByPartition, 
Resources.none()) check and the Resources.lessThan check that occurs a bit 
later.

The comment says:
{code}
   *  When true:
   *stop preempt container when any resource type < 0 for to-
   *preempt.
{code}
but the code will stop preempting if any resource dimension <= 0 since it does:
{code}
  if (conservativeDRF) {
doPreempt = !Resources.isAnyMajorResourceZeroOrNegative(rc,
toObtainByPartition);
{code}
I agree with Eric that this essentially means conservativeDRF is badly broken 
if there is a resource dimension that is not requested by every container, and 
that raises the question of whether it makes sense to make conservativeDRF the 
default.

It would be good to cleanup the unused imports as flagged by checkstyle.


> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-22 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484735#comment-16484735
 ] 

Wangda Tan commented on YARN-8292:
--

Thanks [~jlowe], all great comments. Addressed all of them, please let me know 
if you have any other comments. (007) 

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch, YARN-8292.007.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-22 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Attachment: YARN-8292.007.patch

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch, YARN-8292.007.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8290) SystemMetricsPublisher.appACLsUpdated should be invoked after application information is published to ATS to avoid "User is not set in the application report" Exception

2018-05-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484586#comment-16484586
 ] 

Hudson commented on YARN-8290:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14254 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14254/])
YARN-8290. SystemMetricsPublisher.appACLsUpdated should be invoked after 
(wangda: rev bd15d2396ef0c24fb6b60c6393d16b37651b828e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> SystemMetricsPublisher.appACLsUpdated should be invoked after application 
> information is published to ATS to avoid "User is not set in the application 
> report" Exception
> 
>
> Key: YARN-8290
> URL: https://issues.apache.org/jira/browse/YARN-8290
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8290.001.patch, YARN-8290.002.patch, 
> YARN-8290.003.patch, YARN-8290.004.patch
>
>
> Scenario:
> 1) Start 5 streaming application in background
> 2) Kill Active RM and cause RM failover
> After RM failover, The application failed with below error.
> {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception on [rm2] : 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1517520038847_0003' doesn't exist in RM. Please check 
> that the job submission was successful.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> , so propagating back to caller.
> 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application 
> application_1517520038847_0003
> 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1517520038847_0003
> 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is 
> not set in the application report
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded

2018-05-22 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484588#comment-16484588
 ] 

Robert Kanter commented on YARN-8273:
-

+1 LGTM

> Log aggregation does not warn if HDFS quota in target directory is exceeded
> ---
>
> Key: YARN-8273
> URL: https://issues.apache.org/jira/browse/YARN-8273
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-8273.000.patch, YARN-8273.001.patch, 
> YARN-8273.002.patch, YARN-8273.003.patch, YARN-8273.004.patch, 
> YARN-8273.005.patch, YARN-8273.006.patch
>
>
> It appears that if an HDFS space quota is set on a target directory for log 
> aggregation and the quota is already exceeded when log aggregation is 
> attempted, zero-byte log files will be written to the HDFS directory, however 
> NodeManager logs do not reflect a failure to write the files successfully 
> (i.e. there are no ERROR or WARN messages to this effect).
> An improvement may be worth investigating to alert users to this scenario, as 
> otherwise logs for a YARN application may be missing both on HDFS and locally 
> (after local log cleanup is done) and the user may not otherwise be informed.
> Steps to reproduce:
> * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB)
> * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full
> * Run a Spark or MR job in the cluster
> * Observe that zero byte files are written to HDFS after job completion
> * Observe that YARN container logs are also not present on the NM hosts (or 
> are deleted after yarn.nodemanager.delete.debug-delay-sec)
> * Observe that no ERROR or WARN messages appear to be logged in the NM role 
> log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8332) Incorrect min/max allocation property name in resource types doc

2018-05-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484587#comment-16484587
 ] 

Hudson commented on YARN-8332:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14254 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14254/])
YARN-8332. Incorrect min/max allocation property name in resource types 
(wangda: rev 83f53e5c6236de30c213dc41878cebfb02597e26)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceModel.md


> Incorrect min/max allocation property name in resource types doc
> 
>
> Key: YARN-8332
> URL: https://issues.apache.org/jira/browse/YARN-8332
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8332.001.patch
>
>
> It should be
> {noformat}
> yarn.resource-types..minimum-allocation
> yarn.resource-types..maximum-allocation
> {noformat}
> instead of
> {noformat}
> yarn.resource-types..minimum
> yarn.resource-types..maximum
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8343) YARN should have ability to run images only from a whitelist docker registries

2018-05-22 Thread Wangda Tan (JIRA)

Wangda Tan created YARN-8343:


 Summary: YARN should have ability to run images only from a 
whitelist docker registries
 Key: YARN-8343
 URL: https://issues.apache.org/jira/browse/YARN-8343
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan


This is a superset of docker.privileged-containers.registries, admin can 
specify a whitelist and all images from non-privileged-container.registries 
will be rejected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-05-22 Thread Hsin-Liang Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484721#comment-16484721
 ] 

Hsin-Liang Huang commented on YARN-8326:


[~eyang]   this afternoon,  I tried the command and the performance was 
dramatically improved.  It used to run 8 seconds, now it ran 3 seconds 
consistently, then I compared with the other HDP 3.0 cluster which I didn't 
make the properties changes that you suggested, and it still ran 8 seconds 
consistently.   I am going to run our testcases to see if the performance is 
also improved there. 

> Yarn 3.0 seems runs slower than Yarn 2.6
> 
>
> Key: YARN-8326
> URL: https://issues.apache.org/jira/browse/YARN-8326
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
> Environment: This is the yarn-site.xml for 3.0. 
>  
> 
> 
>  hadoop.registry.dns.bind-port
>  5353
>  
> 
>  hadoop.registry.dns.domain-name
>  hwx.site
>  
> 
>  hadoop.registry.dns.enabled
>  true
>  
> 
>  hadoop.registry.dns.zone-mask
>  255.255.255.0
>  
> 
>  hadoop.registry.dns.zone-subnet
>  172.17.0.0
>  
> 
>  manage.include.files
>  false
>  
> 
>  yarn.acl.enable
>  false
>  
> 
>  yarn.admin.acl
>  yarn
>  
> 
>  yarn.client.nodemanager-connect.max-wait-ms
>  6
>  
> 
>  yarn.client.nodemanager-connect.retry-interval-ms
>  1
>  
> 
>  yarn.http.policy
>  HTTP_ONLY
>  
> 
>  yarn.log-aggregation-enable
>  false
>  
> 
>  yarn.log-aggregation.retain-seconds
>  2592000
>  
> 
>  yarn.log.server.url
>  
> [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs]
>  
> 
>  yarn.log.server.web-service.url
>  
> [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory]
>  
> 
>  yarn.node-labels.enabled
>  false
>  
> 
>  yarn.node-labels.fs-store.retry-policy-spec
>  2000, 500
>  
> 
>  yarn.node-labels.fs-store.root-dir
>  /system/yarn/node-labels
>  
> 
>  yarn.nodemanager.address
>  0.0.0.0:45454
>  
> 
>  yarn.nodemanager.admin-env
>  MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX
>  
> 
>  yarn.nodemanager.aux-services
>  mapreduce_shuffle,spark2_shuffle,timeline_collector
>  
> 
>  yarn.nodemanager.aux-services.mapreduce_shuffle.class
>  org.apache.hadoop.mapred.ShuffleHandler
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.classpath
>  /usr/spark2/aux/*
>  
> 
>  yarn.nodemanager.aux-services.spark_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.timeline_collector.class
>  
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService
>  
> 
>  yarn.nodemanager.bind-host
>  0.0.0.0
>  
> 
>  yarn.nodemanager.container-executor.class
>  
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
>  
> 
>  yarn.nodemanager.container-metrics.unregister-delay-ms
>  6
>  
> 
>  yarn.nodemanager.container-monitor.interval-ms
>  3000
>  
> 
>  yarn.nodemanager.delete.debug-delay-sec
>  0
>  
> 
>  
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
>  90
>  
> 
>  yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
>  1000
>  
> 
>  yarn.nodemanager.disk-health-checker.min-healthy-disks
>  0.25
>  
> 
>  yarn.nodemanager.health-checker.interval-ms
>  135000
>  
> 
>  yarn.nodemanager.health-checker.script.timeout-ms
>  6
>  
> 
>  
> yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage
>  false
>  
> 
>  yarn.nodemanager.linux-container-executor.group
>  hadoop
>  
> 
>  
> yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users
>  false
>  
> 
>  yarn.nodemanager.local-dirs
>  /hadoop/yarn/local
>  
> 
>  yarn.nodemanager.log-aggregation.compression-type
>  gz
>  
> 
>  yarn.nodemanager.log-aggregation.debug-enabled
>  false
>  
> 
>  yarn.nodemanager.log-aggregation.num-log-files-per-app
>  30
>  
> 
>  
> yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
>  3600
>  
> 
>  yarn.nodemanager.log-dirs
>  /hadoop/yarn/log
>  
> 
>  yarn.nodemanager.log.retain-seconds
>  604800
>  
> 
>  yarn.nodemanager.pmem-check-enabled
>  false
>  
> 
>  yarn.nodemanager.recovery.dir
>  /var/log/hadoop-yarn/nodemanager/recovery-state
>  
> 
>  yarn.nodemanager.recovery.enabled
>  true
>  
> 
>  yarn.nodemanager.recovery.supervised
>  true
>  
> 
>  yarn.nodemanager.remote-app-log-dir
>  /app-logs
>  
> 
>  yarn.nodemanager.remote-app-log-dir-suffix
>  logs
>  
> 
>  yarn.nodemanager.resource-plugins
>  
>  
> 
>  yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices
>  auto
>  
> 
>  yarn.nodemanager.resource-plugins.gpu.docker-plugin
>  nvidia-docker-v1
>  
> 
>

[jira] [Commented] (YARN-8316) Diagnostic message should improve when yarn service fails to launch due to ATS unavailability

2018-05-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16485766#comment-16485766
 ] 

genericqa commented on YARN-8316:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 28m 
37s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 85m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8316 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924638/YARN-8316.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 786f4b1ac210 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 43be9ab |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20829/testReport/ |
| Max. process+thread count | 703 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20829/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Diagnostic message should improve when yarn service fails to launch due to 
> ATS unavailability
>

[jira] [Updated] (YARN-8341) Yarn Service: Integration tests

2018-05-22 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8341:

Attachment: YARN-8341.wip.patch

> Yarn Service: Integration tests 
> 
>
> Key: YARN-8341
> URL: https://issues.apache.org/jira/browse/YARN-8341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8341.wip.patch
>
>
> In order to test the rest api end-to-end, we can add Integration tests for 
> Yarn service api. 
> The integration tests 
> * belong to junit category {{IntegrationTest}}.
> * will be only run when triggered by executing {{mvn 
> failsafe:integration-test}}
> * the surefire plugin for regular tests excludes {{IntegrationTest}}
> * RM host, user name, and any additional properties which are needed to 
> execute the tests against a cluster can be passed as System properties.
> For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}
> We can add more integration tests which can check scalability and performance.
> Have these tests here benefits everyone in the community because anyone can 
> run these tests against there cluster. 
> Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8316) Diagnostic message should improve when yarn service fails to launch due to ATS unavailability

2018-05-22 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8316:
-
Attachment: YARN-8316.001.patch

> Diagnostic message should improve when yarn service fails to launch due to 
> ATS unavailability
> -
>
> Key: YARN-8316
> URL: https://issues.apache.org/jira/browse/YARN-8316
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8316.001.patch
>
>
> Scenario:
> 1) shutdown ATS
> 2) launch yarn service.
> yarn service launch cmd fails with below stack trace. There is no diagnostic 
> message available in response.
> {code:java}
> bash-4.2$ yarn app -launch hbase-sec /tmp/hbase-secure.yar 
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> 18/05/17 13:24:43 INFO client.RMProxy: Connecting to ResourceManager at 
> xxx/xxx:8050
> 18/05/17 13:24:44 INFO client.AHSProxy: Connecting to Application History 
> server at localhost/xxx:10200
> 18/05/17 13:24:44 INFO client.RMProxy: Connecting to ResourceManager at 
> xxx/xxx:8050
> 18/05/17 13:24:44 INFO client.AHSProxy: Connecting to Application History 
> server at localhost/127.0.0.1:10200
> 18/05/17 13:24:44 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /tmp/hbase-secure.yar
> 18/05/17 13:26:06 ERROR client.ApiServiceClient: 
> bash-4.2$ echo $?
> 56{code}
> The Error message should provide ConnectionRefused exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-22 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484651#comment-16484651
 ] 

Botong Huang edited comment on YARN-8334 at 5/22/18 10:01 PM:
--

I realized that I confused destroy() with finalize() earlier. +1 on the patch 
pending on the findbug warning. 


was (Author: botong):
I realized that I confused destroy() with finalize() earlier. +1 on the patch. 

> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8339) Service AM should localize static/archive resource types to container working directory instead of 'resources'

2018-05-22 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8339:
-
Target Version/s: 3.1.1
Priority: Critical  (was: Major)

> Service AM should localize static/archive resource types to container working 
> directory instead of 'resources' 
> ---
>
> Key: YARN-8339
> URL: https://issues.apache.org/jira/browse/YARN-8339
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8339.1.patch
>
>
> This is to address one of the review comments posted by [~wangda] in 
> YARN-8079 at 
> https://issues.apache.org/jira/browse/YARN-8079?focusedCommentId=16482065=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16482065



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8339) Service AM should localize static/archive resource types to container working directory instead of 'resources'

2018-05-22 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484538#comment-16484538
 ] 

Wangda Tan commented on YARN-8339:
--

Thanks [~suma.shivaprasad] for the quick fix, patch LGTM, just triggered 
another Jenkins run, will commit it after that. 

> Service AM should localize static/archive resource types to container working 
> directory instead of 'resources' 
> ---
>
> Key: YARN-8339
> URL: https://issues.apache.org/jira/browse/YARN-8339
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8339.1.patch
>
>
> This is to address one of the review comments posted by [~wangda] in 
> YARN-8079 at 
> https://issues.apache.org/jira/browse/YARN-8079?focusedCommentId=16482065=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16482065



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-22 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16485752#comment-16485752
 ] 

Vinod Kumar Vavilapalli commented on YARN-8338:
---

[~jlowe], yeah, it wasn't easy to figure out who / what depends on that JAR.

No easy way to unit test your approach, let me try that in a cluster.

> TimelineService V1.5 doesn't come up after HADOOP-15406
> ---
>
> Key: YARN-8338
> URL: https://issues.apache.org/jira/browse/YARN-8338
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: YARN-8338.txt
>
>
> TimelineService V1.5 fails with the following:
> {code}
> java.lang.NoClassDefFoundError: org/objenesis/Objenesis
>   at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8290) SystemMetricsPublisher.appACLsUpdated should be invoked after application information is published to ATS to avoid

2018-05-22 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8290:
-
Summary: SystemMetricsPublisher.appACLsUpdated should be invoked after 
application information is published to ATS to avoid   (was: Yarn application 
failed to recover with "Error Launching job : User is not set in the 
application report" error after RM restart)

> SystemMetricsPublisher.appACLsUpdated should be invoked after application 
> information is published to ATS to avoid 
> ---
>
> Key: YARN-8290
> URL: https://issues.apache.org/jira/browse/YARN-8290
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8290.001.patch, YARN-8290.002.patch, 
> YARN-8290.003.patch, YARN-8290.004.patch
>
>
> Scenario:
> 1) Start 5 streaming application in background
> 2) Kill Active RM and cause RM failover
> After RM failover, The application failed with below error.
> {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception on [rm2] : 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1517520038847_0003' doesn't exist in RM. Please check 
> that the job submission was successful.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> , so propagating back to caller.
> 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application 
> application_1517520038847_0003
> 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1517520038847_0003
> 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is 
> not set in the application report
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8332) Incorrect min/max allocation property name in resource types doc

2018-05-22 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484513#comment-16484513
 ] 

Wangda Tan commented on YARN-8332:
--

+1, committing ..

> Incorrect min/max allocation property name in resource types doc
> 
>
> Key: YARN-8332
> URL: https://issues.apache.org/jira/browse/YARN-8332
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: YARN-8332.001.patch
>
>
> It should be
> {noformat}
> yarn.resource-types..minimum-allocation
> yarn.resource-types..maximum-allocation
> {noformat}
> instead of
> {noformat}
> yarn.resource-types..minimum
> yarn.resource-types..maximum
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8343) YARN should have ability to run images only from a whitelist docker registries

2018-05-22 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484695#comment-16484695
 ] 

Wangda Tan commented on YARN-8343:
--

cc: [~shaneku...@gmail.com], [~eyang], [~ebadger], [~jlowe]

> YARN should have ability to run images only from a whitelist docker registries
> --
>
> Key: YARN-8343
> URL: https://issues.apache.org/jira/browse/YARN-8343
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> This is a superset of docker.privileged-containers.registries, admin can 
> specify a whitelist and all images from non-privileged-container.registries 
> will be rejected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 140 matches

Mail list logo