[jira] [Commented] (YARN-9099) GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way

2019-01-28 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754589#comment-16754589
 ] 

Sunil Govindan commented on YARN-9099:
--

+1. Lets get this in.

Committing shortly if no objections.

> GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way
> --
>
> Key: YARN-9099
> URL: https://issues.apache.org/jira/browse/YARN-9099
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9099.001.patch, YARN-9099.002.patch
>
>
> getReleasingGpus plays an important role in the calculation which happens 
> when GpuAllocator assign GPUs to a container, see: 
> GpuResourceAllocator#internalAssignGpus.
> If multiple GPUs are assigned to the same container, getReleasingGpus will 
> return an invalid number.
> The iterator goes over on mappings of (GPU device, container ID) and it 
> retrieves the container by its ID the number of times the container ID is 
> mapped to any device.
> Then for every container, the resource value for the GPU resource is added to 
> a running sum.
> Obviously, if a container is mapped to 2 or more devices, then the 
> container's GPU resource counter is added to the running sum as many times as 
> the number of GPU devices the container has.
> Example: 
> Let's suppose {{usedDevices}} contains these mappings: 
> - (GPU1, container1)
> - (GPU2, container1)
> - (GPU3, container2)
> GPU resource value is 2 for container1 and 
> GPU resource value is 1 for container2.
> Then, if container1 is in a running state, getReleasingGpus will return 4 
> instead of 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9099) GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way

2019-01-28 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754568#comment-16754568
 ] 

Zhankun Tang commented on YARN-9099:


[~pbacsko], [~snemeth] . Since it's a quite straight forward fix. I think we 
may get it in without test case?

Any comment, [~sunilg] ?

> GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way
> --
>
> Key: YARN-9099
> URL: https://issues.apache.org/jira/browse/YARN-9099
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9099.001.patch, YARN-9099.002.patch
>
>
> getReleasingGpus plays an important role in the calculation which happens 
> when GpuAllocator assign GPUs to a container, see: 
> GpuResourceAllocator#internalAssignGpus.
> If multiple GPUs are assigned to the same container, getReleasingGpus will 
> return an invalid number.
> The iterator goes over on mappings of (GPU device, container ID) and it 
> retrieves the container by its ID the number of times the container ID is 
> mapped to any device.
> Then for every container, the resource value for the GPU resource is added to 
> a running sum.
> Obviously, if a container is mapped to 2 or more devices, then the 
> container's GPU resource counter is added to the running sum as many times as 
> the number of GPU devices the container has.
> Example: 
> Let's suppose {{usedDevices}} contains these mappings: 
> - (GPU1, container1)
> - (GPU2, container1)
> - (GPU3, container2)
> GPU resource value is 2 for container1 and 
> GPU resource value is 1 for container2.
> Then, if container1 is in a running state, getReleasingGpus will return 4 
> instead of 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9099) GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way

2019-01-24 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751126#comment-16751126
 ] 

Peter Bacsko commented on YARN-9099:


[~snemeth] as [~tangzhankun] pointed out, is it possible to add a unit test for 
this?

> GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way
> --
>
> Key: YARN-9099
> URL: https://issues.apache.org/jira/browse/YARN-9099
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9099.001.patch, YARN-9099.002.patch
>
>
> getReleasingGpus plays an important role in the calculation which happens 
> when GpuAllocator assign GPUs to a container, see: 
> GpuResourceAllocator#internalAssignGpus.
> If multiple GPUs are assigned to the same container, getReleasingGpus will 
> return an invalid number.
> The iterator goes over on mappings of (GPU device, container ID) and it 
> retrieves the container by its ID the number of times the container ID is 
> mapped to any device.
> Then for every container, the resource value for the GPU resource is added to 
> a running sum.
> Obviously, if a container is mapped to 2 or more devices, then the 
> container's GPU resource counter is added to the running sum as many times as 
> the number of GPU devices the container has.
> Example: 
> Let's suppose {{usedDevices}} contains these mappings: 
> - (GPU1, container1)
> - (GPU2, container1)
> - (GPU3, container2)
> GPU resource value is 2 for container1 and 
> GPU resource value is 1 for container2.
> Then, if container1 is in a running state, getReleasingGpus will return 4 
> instead of 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9099) GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way

2018-12-18 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723882#comment-16723882
 ] 

Szilard Nemeth commented on YARN-9099:
--

[~devaraj.k], [~tangzhankun], [~zyluo], [~sunilg], [~leftnoteasy]: As you have 
a lot expertise in this area, could you please help me to review (and hopefully 
commit) this? 

Thanks!

> GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way
> --
>
> Key: YARN-9099
> URL: https://issues.apache.org/jira/browse/YARN-9099
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9099.001.patch, YARN-9099.002.patch
>
>
> getReleasingGpus plays an important role in the calculation which happens 
> when GpuAllocator assign GPUs to a container, see: 
> GpuResourceAllocator#internalAssignGpus.
> If multiple GPUs are assigned to the same container, getReleasingGpus will 
> return an invalid number.
> The iterator goes over on mappings of (GPU device, container ID) and it 
> retrieves the container by its ID the number of times the container ID is 
> mapped to any device.
> Then for every container, the resource value for the GPU resource is added to 
> a running sum.
> Obviously, if a container is mapped to 2 or more devices, then the 
> container's GPU resource counter is added to the running sum as many times as 
> the number of GPU devices the container has.
> Example: 
> Let's suppose {{usedDevices}} contains these mappings: 
> - (GPU1, container1)
> - (GPU2, container1)
> - (GPU3, container2)
> GPU resource value is 2 for container1 and 
> GPU resource value is 1 for container2.
> Then, if container1 is in a running state, getReleasingGpus will return 4 
> instead of 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9099) GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way

2018-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722985#comment-16722985
 ] 

Hadoop QA commented on YARN-9099:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m  
4s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9099 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12952031/YARN-9099.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d181ec288eea 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 
31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 346c0c8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22895/testReport/ |
| Max. process+thread count | 313 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/22895/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Commented] (YARN-9099) GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way

2018-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714573#comment-16714573
 ] 

Hadoop QA commented on YARN-9099:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 43s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9099 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951132/YARN-9099.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9410d06631e4 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 17a8708 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/22822/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 

[jira] [Commented] (YARN-9099) GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way

2018-12-10 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714513#comment-16714513
 ] 

Zhankun Tang commented on YARN-9099:


[~snemeth], Thanks for catching up this! The patch looks good to me. And a test 
case would be better.

> GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way
> --
>
> Key: YARN-9099
> URL: https://issues.apache.org/jira/browse/YARN-9099
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9099.001.patch
>
>
> getReleasingGpus plays an important role in the calculation which happens 
> when GpuAllocator assign GPUs to a container, see: 
> GpuResourceAllocator#internalAssignGpus.
> If multiple GPUs are assigned to the same container, getReleasingGpus will 
> return an invalid number.
> The iterator goes over on mappings of (GPU device, container ID) and it 
> retrieves the container by its ID the number of times the container ID is 
> mapped to any device.
> Then for every container, the resource value for the GPU resource is added to 
> a running sum.
> Obviously, if a container is mapped to 2 or more devices, then the 
> container's GPU resource counter is added to the running sum as many times as 
> the number of GPU devices the container has.
> Example: 
> Let's suppose {{usedDevices}} contains these mappings: 
> - (GPU1, container1)
> - (GPU2, container1)
> - (GPU3, container2)
> GPU resource value is 2 for container1 and 
> GPU resource value is 1 for container2.
> Then, if container1 is in a running state, getReleasingGpus will return 4 
> instead of 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org