[jira] [Assigned] (YARN-10487) Support getQueueUserAcls, listReservations, getApplicationAttempts, getContainerReport, getContainers, getResourceTypeInfo API's for Federation
[ https://issues.apache.org/jira/browse/YARN-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] D M Murali Krishna Reddy reassigned YARN-10487: --- Assignee: D M Murali Krishna Reddy > Support getQueueUserAcls, listReservations, getApplicationAttempts, > getContainerReport, getContainers, getResourceTypeInfo API's for Federation > --- > > Key: YARN-10487 > URL: https://issues.apache.org/jira/browse/YARN-10487 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Attachments: YARN-10487.001.patch > > > Support getQueueUserAcls, listReservations, getApplicationAttempts, > getContainerReport, getContainers, getResourceTypeInfo API's for Federation -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9883) Reshape SchedulerHealth class
[ https://issues.apache.org/jira/browse/YARN-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235959#comment-17235959 ] D M Murali Krishna Reddy commented on YARN-9883: [~BilwaST] I have uploaded the patch, can you review the patch > Reshape SchedulerHealth class > - > > Key: YARN-9883 > URL: https://issues.apache.org/jira/browse/YARN-9883 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: D M Murali Krishna Reddy >Priority: Minor > Attachments: YARN-9883.001.patch > > > The {{SchedulerHealth}} class has some flaws, for example: > - It has no javadoc at all > - All its objects are package-private: they should be private > - The internal maps should be (Concurrent) EnumMaps instead of HashMaps: they > are more efficient in storing Enums > - schedulerHealthDetails only stores the last operation, its name should > reflect that (just like lastSchedulerRunDetails) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9883) Reshape SchedulerHealth class
[ https://issues.apache.org/jira/browse/YARN-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] D M Murali Krishna Reddy updated YARN-9883: --- Attachment: YARN-9883.001.patch > Reshape SchedulerHealth class > - > > Key: YARN-9883 > URL: https://issues.apache.org/jira/browse/YARN-9883 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: D M Murali Krishna Reddy >Priority: Minor > Attachments: YARN-9883.001.patch > > > The {{SchedulerHealth}} class has some flaws, for example: > - It has no javadoc at all > - All its objects are package-private: they should be private > - The internal maps should be (Concurrent) EnumMaps instead of HashMaps: they > are more efficient in storing Enums > - schedulerHealthDetails only stores the last operation, its name should > reflect that (just like lastSchedulerRunDetails) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235931#comment-17235931 ] angerszhu edited comment on YARN-10495 at 11/20/20, 6:49 AM: - [~ebadger] Double check, this can solve our problem. we add rpath to $HADOOP_HOME/lib/native was (Author: angerszhuuu): [~ebadger] Double check, this can solve our problem. > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Attachments: YARN-10495.001.patch > > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1.* > We use a internal custom dynamic link library environment > /usr/lib/x86_64-linux-gnu > and we build hadoop with parameter as blow > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where > is libcrypto) > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is > libcrypto) > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 > lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so > {code} > We build container-executor with > The libcrypto.so 's version is not same case error when we start nodemanager > > {code:java} > .. 3 more Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: > error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared > object file: No such file or directory at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) > ... 4 more Caused by: ExitCodeException exitCode=127: > /home/hadoop/hadoop/bin/container-executor: error while loading shared > libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file > or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) > ... 6 more > {code} > > We should make RPATH of container-executor configurable to solve this problem -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235931#comment-17235931 ] angerszhu edited comment on YARN-10495 at 11/20/20, 6:45 AM: - [~ebadger] Double check, this can solve our problem. was (Author: angerszhuuu): [~ebadger] Double check, this can solve our prblem. > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Attachments: YARN-10495.001.patch > > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1.* > We use a internal custom dynamic link library environment > /usr/lib/x86_64-linux-gnu > and we build hadoop with parameter as blow > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where > is libcrypto) > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is > libcrypto) > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 > lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so > {code} > We build container-executor with > The libcrypto.so 's version is not same case error when we start nodemanager > > {code:java} > .. 3 more Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: > error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared > object file: No such file or directory at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) > ... 4 more Caused by: ExitCodeException exitCode=127: > /home/hadoop/hadoop/bin/container-executor: error while loading shared > libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file > or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) > ... 6 more > {code} > > We should make RPATH of container-executor configurable to solve this problem -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235931#comment-17235931 ] angerszhu commented on YARN-10495: -- [~ebadger] Double check, this can solve our prblem. > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Attachments: YARN-10495.001.patch > > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1.* > We use a internal custom dynamic link library environment > /usr/lib/x86_64-linux-gnu > and we build hadoop with parameter as blow > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where > is libcrypto) > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is > libcrypto) > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 > lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so > {code} > We build container-executor with > The libcrypto.so 's version is not same case error when we start nodemanager > > {code:java} > .. 3 more Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: > error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared > object file: No such file or directory at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) > ... 4 more Caused by: ExitCodeException exitCode=127: > /home/hadoop/hadoop/bin/container-executor: error while loading shared > libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file > or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) > ... 6 more > {code} > > We should make RPATH of container-executor configurable to solve this problem -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235890#comment-17235890 ] angerszhu commented on YARN-10495: -- [~ebadger] I have tested this patch in our env, and I will confirm again whether there are other problems. Also, I am not familiar with hadoop qa 's result, seems all UT passed? > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Attachments: YARN-10495.001.patch > > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1.* > We use a internal custom dynamic link library environment > /usr/lib/x86_64-linux-gnu > and we build hadoop with parameter as blow > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where > is libcrypto) > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is > libcrypto) > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 > lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so > {code} > We build container-executor with > The libcrypto.so 's version is not same case error when we start nodemanager > > {code:java} > .. 3 more Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: > error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared > object file: No such file or directory at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) > ... 4 more Caused by: ExitCodeException exitCode=127: > /home/hadoop/hadoop/bin/container-executor: error while loading shared > libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file > or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) > ... 6 more > {code} > > We should make RPATH of container-executor configurable to solve this problem -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235876#comment-17235876 ] Drew Merrill commented on YARN-10427: - _*Anyone?*_ *I'd really appreciate a response from someone on this.* _*A developer? A fellow user? A computer?*_ Have I not included enough info or the right info needed to investigate this? If so, please let me know! *At the very least, can someone else please _{color:#FF}confirm{color}_ _{color:#FF}that the issue with duplicate Job IDs is reproducible?{color}_ {color:#FF}{color:#172b4d}It's frustrating and stressful not knowing if the problem is due to something that _I'm doing wrong_ or if it's a bug in Hadoop.{color}{color}* *{color:#FF}{color:#172b4d}There's either a teachable moment here where I can learn what I'm doing wrong or else an opportunity to identify and fix a bug in Hadoop. Both are good outcomes!{color}{color}* > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Priority: Major > Attachments: fair-scheduler.xml, inputsls.json, jobruntime.csv, > mapred-site.xml, sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10492) deadlock in rm
[ https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235833#comment-17235833 ] Wangda Tan commented on YARN-10492: --- That will be helpful, thanks Jufeng! > deadlock in rm > --- > > Key: YARN-10492 > URL: https://issues.apache.org/jira/browse/YARN-10492 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1 >Reporter: brick yang >Priority: Critical > Labels: 3.1.1 > > version: HDP-3.1.5.0-152 hadoop3.1 > capacity scheduler > yarn sometimes not change to active > we found that jstack dump has deadlocked: > "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 > tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry > [0x7fee09633000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323) > - waiting to lock <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > > > > > "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 > tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0003c0d0d6c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > - locked <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.
[jira] [Commented] (YARN-10492) deadlock in rm
[ https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235828#comment-17235828 ] jufeng li commented on YARN-10492: -- We are using the same version(HDP-3.1.5.0-152 hadoop3.1),and we got the same issue,I solved this issue,do you want patch? > deadlock in rm > --- > > Key: YARN-10492 > URL: https://issues.apache.org/jira/browse/YARN-10492 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1 >Reporter: brick yang >Priority: Critical > Labels: 3.1.1 > > version: HDP-3.1.5.0-152 hadoop3.1 > capacity scheduler > yarn sometimes not change to active > we found that jstack dump has deadlocked: > "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 > tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry > [0x7fee09633000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323) > - waiting to lock <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > > > > > "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 > tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0003c0d0d6c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > - locked <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server
[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235683#comment-17235683 ] Wangda Tan commented on YARN-10496: --- Worked with [~bteke] for a design doc, see the linked doc. Would like to see more comments from the community: cc: [~epayne], [~jhung], [~tangzhankun], [~bilwa_st] > [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler > - > > Key: YARN-10496 > URL: https://issues.apache.org/jira/browse/YARN-10496 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Priority: Major > > CapacityScheduler today doesn’t support an auto queue creation which is > flexible enough. The current constraints: > * Only leaf queues can be auto-created > * A parent can only have either static queues or dynamic ones. This causes > multiple constraints. For example: > * It isn’t possible to have a VIP user like Alice with a static queue > root.user.alice with 50% capacity while the other user queues (under > root.user) are created dynamically and they share the remaining 50% of > resources. > > * In comparison, FairScheduler allows the following scenarios, Capacity > Scheduler doesn’t: > ** This implies that there is no possibility to have both dynamically > created and static queues at the same time under root > * A new queue needs to be created under an existing parent, while the parent > already has static queues > * Nested queue mapping policy, like in the following example: > | > > | > * Here two levels of queues may need to be created > If an application belongs to user _alice_ (who has the primary_group of > _engineering_), the scheduler checks whether _root.engineering_ exists, if it > doesn’t, it’ll be created. Then scheduler checks whether > _root.engineering.alice_ exists, and creates it if it doesn't. > > When we try to move users from FairScheduler to CapacityScheduler, we face > feature gaps which blocks users migrate from FS to CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235658#comment-17235658 ] Hadoop QA commented on YARN-10495: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 9s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} codespell {color} | {color:blue} 0m 2s{color} | | {color:blue} codespell was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 8m 20s{color} | [/branch-mvninstall-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-mvninstall-root.txt] | {color:red} root in trunk failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 10s{color} | [/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt] | {color:red} hadoop-yarn-server-nodemanager in trunk failed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 20s{color} | [/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt] | {color:red} hadoop-yarn-server-nodemanager in trunk failed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10. {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 10m 32s{color} | [/branch-shadedclient.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-shadedclient.txt] | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 20s{color} | [/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt] | {color:red} hadoop-yarn-server-nodemanager in trunk failed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s{color} | [/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt] | {color:red} hadoop-yarn-server-nodemanager in trunk failed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 23s{color} | [/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop
[jira] [Created] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
Wangda Tan created YARN-10496: - Summary: [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler Key: YARN-10496 URL: https://issues.apache.org/jira/browse/YARN-10496 Project: Hadoop YARN Issue Type: New Feature Components: capacity scheduler Reporter: Wangda Tan CapacityScheduler today doesn’t support an auto queue creation which is flexible enough. The current constraints: * Only leaf queues can be auto-created * A parent can only have either static queues or dynamic ones. This causes multiple constraints. For example: * It isn’t possible to have a VIP user like Alice with a static queue root.user.alice with 50% capacity while the other user queues (under root.user) are created dynamically and they share the remaining 50% of resources. * In comparison, FairScheduler allows the following scenarios, Capacity Scheduler doesn’t: ** This implies that there is no possibility to have both dynamically created and static queues at the same time under root * A new queue needs to be created under an existing parent, while the parent already has static queues * Nested queue mapping policy, like in the following example: | | * Here two levels of queues may need to be created If an application belongs to user _alice_ (who has the primary_group of _engineering_), the scheduler checks whether _root.engineering_ exists, if it doesn’t, it’ll be created. Then scheduler checks whether _root.engineering.alice_ exists, and creates it if it doesn't. When we try to move users from FairScheduler to CapacityScheduler, we face feature gaps which blocks users migrate from FS to CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235627#comment-17235627 ] Eric Badger commented on YARN-10495: Also, I've added you as a contributor in Hadoop Common, HDFS, Map/Reduce, and YARN. So you will now be able to assign JIRAs to yourself (as I've already done for you on this JIRA). > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Attachments: YARN-10495.001.patch > > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1.* > We use a internal custom dynamic link library environment > /usr/lib/x86_64-linux-gnu > and we build hadoop with parameter as blow > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where > is libcrypto) > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is > libcrypto) > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 > lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so > {code} > We build container-executor with > The libcrypto.so 's version is not same case error when we start nodemanager > > {code:java} > .. 3 more Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: > error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared > object file: No such file or directory at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) > ... 4 more Caused by: ExitCodeException exitCode=127: > /home/hadoop/hadoop/bin/container-executor: error while loading shared > libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file > or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) > ... 6 more > {code} > > We should make RPATH of container-executor configurable to solve this problem -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reassigned YARN-10495: -- Assignee: angerszhu > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Attachments: YARN-10495.001.patch > > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1.* > We use a internal custom dynamic link library environment > /usr/lib/x86_64-linux-gnu > and we build hadoop with parameter as blow > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where > is libcrypto) > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is > libcrypto) > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 > lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so > {code} > We build container-executor with > The libcrypto.so 's version is not same case error when we start nodemanager > > {code:java} > .. 3 more Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: > error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared > object file: No such file or directory at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) > ... 4 more Caused by: ExitCodeException exitCode=127: > /home/hadoop/hadoop/bin/container-executor: error while loading shared > libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file > or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) > ... 6 more > {code} > > We should make RPATH of container-executor configurable to solve this problem -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235625#comment-17235625 ] Eric Badger commented on YARN-10495: [~angerszhuuu], I imagine the {{-Dbundle.openssl}} adds the libcrypto.so library to {{../lib/native}} of the build that is created? I don't have experience with this flag. Also, have you tested this out in your environment? > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Priority: Major > Attachments: YARN-10495.001.patch > > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1.* > We use a internal custom dynamic link library environment > /usr/lib/x86_64-linux-gnu > and we build hadoop with parameter as blow > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where > is libcrypto) > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is > libcrypto) > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 > lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so > {code} > We build container-executor with > The libcrypto.so 's version is not same case error when we start nodemanager > > {code:java} > .. 3 more Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: > error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared > object file: No such file or directory at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) > ... 4 more Caused by: ExitCodeException exitCode=127: > /home/hadoop/hadoop/bin/container-executor: error while loading shared > libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file > or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) > ... 6 more > {code} > > We should make RPATH of container-executor configurable to solve this problem -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10482) Capacity Scheduler seems locked,RM cannot submit any new job,and change active RM manually return to normal
[ https://issues.apache.org/jira/browse/YARN-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235542#comment-17235542 ] Wanqiang Ji commented on YARN-10482: I discussed with [~Jufeng] offline many days ago, and found it seems caused by JUC bug, which has be fixed in JDK9. [https://bugs.openjdk.java.net/browse/JDK-8134855] Maybe YARN-10492 encountered the same problem. cc: [~wangda], [~snemeth] > Capacity Scheduler seems locked,RM cannot submit any new job,and change > active RM manually return to normal > > > Key: YARN-10482 > URL: https://issues.apache.org/jira/browse/YARN-10482 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler, resourcemanager, > RM >Affects Versions: 3.1.1 >Reporter: jufeng li >Priority: Blocker > Attachments: RM_normal_state.stack, RM_unnormal_state.stack > > > Capacity Scheduler seems locked,RM cannot submit any new job, and change > active RM manually return to normal。its a serious bug!I check the stack > log,and found some info about *ReentrantReadWriteLock。*Can anyone can solve > this issue?I uploaded the stack when RM normally and unnormally。RM hangs > forever until I restart RM or change the active RM manually!! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8737) Race condition in ParentQueue when reinitializing and sorting child queues in the meanwhile
[ https://issues.apache.org/jira/browse/YARN-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235386#comment-17235386 ] Benjamin Teke commented on YARN-8737: - The test issue seems to be unrelated, so +1 (non-binding) on my part. > Race condition in ParentQueue when reinitializing and sorting child queues in > the meanwhile > --- > > Key: YARN-8737 > URL: https://issues.apache.org/jira/browse/YARN-8737 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0, 2.9.3, 3.2.2, 3.1.4 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8737.001.patch > > > Administrator raised a update for queues through REST API, in RM parent queue > is refreshing child queues through calling ParentQueue#reinitialize, > meanwhile, async-schedule threads is sorting child queues when calling > ParentQueue#sortAndGetChildrenAllocationIterator. Race condition may happen > and throw exception as follow because TimSort does not handle the concurrent > modification of objects it is sorting: > {noformat} > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeCollapse(TimSort.java:441) > at java.util.TimSort.sort(TimSort.java:245) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:291) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:804) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:817) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:636) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:2494) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:2431) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersOnMultiNodes(CapacityScheduler.java:2588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:2676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.scheduleBasedOnNodeLabels(CapacityScheduler.java:927) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:962) > {noformat} > I think we can add read-lock for > ParentQueue#sortAndGetChildrenAllocationIterator to solve this problem, the > write-lock will be hold when updating child queues in > ParentQueue#reinitialize. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated YARN-3585: - Comment: was deleted (was: Leveldb will have problems using logger) > NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled > -- > > Key: YARN-3585 > URL: https://issues.apache.org/jira/browse/YARN-3585 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.6.1, 2.8.0, 2.7.1, 3.0.0-alpha1 > > Attachments: 0001-YARN-3585.patch, YARN-3585.patch > > > With NM recovery enabled, after decommission, nodemanager log show stop but > process cannot end. > non daemon thread: > {noformat} > "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on > condition [0x] > "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable > [0x] > "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 > nid=0x29ed runnable > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 > nid=0x29ee runnable > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 > nid=0x29ef runnable > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 > nid=0x29f0 runnable > "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 > nid=0x29f1 runnable > "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 > nid=0x29f2 runnable > "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 > nid=0x29f3 runnable > "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 > nid=0x29f4 runnable > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 > runnable > "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 > nid=0x29f5 runnable > "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 > nid=0x29f6 runnable > "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting > on condition > {noformat} > and jni leveldb thread stack > {noformat} > Thread 12 (Thread 0x7f33dd842700 (LWP 10903)): > #0 0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x7f33dfce2a3b in leveldb::(anonymous > namespace)::PosixEnv::BGThreadWrapper(void*) () from > /tmp/libleveldbjni-64-1-6922178968300745716.8 > #2 0x003d83407851 in start_thread () from /lib64/libpthread.so.0 > #3 0x003d830e811d in clone () from /lib64/libc.so.6 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235357#comment-17235357 ] dzcxzl commented on YARN-3585: -- Leveldb will have problems using logger > NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled > -- > > Key: YARN-3585 > URL: https://issues.apache.org/jira/browse/YARN-3585 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.6.1, 2.8.0, 2.7.1, 3.0.0-alpha1 > > Attachments: 0001-YARN-3585.patch, YARN-3585.patch > > > With NM recovery enabled, after decommission, nodemanager log show stop but > process cannot end. > non daemon thread: > {noformat} > "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on > condition [0x] > "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable > [0x] > "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 > nid=0x29ed runnable > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 > nid=0x29ee runnable > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 > nid=0x29ef runnable > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 > nid=0x29f0 runnable > "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 > nid=0x29f1 runnable > "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 > nid=0x29f2 runnable > "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 > nid=0x29f3 runnable > "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 > nid=0x29f4 runnable > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 > runnable > "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 > nid=0x29f5 runnable > "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 > nid=0x29f6 runnable > "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting > on condition > {noformat} > and jni leveldb thread stack > {noformat} > Thread 12 (Thread 0x7f33dd842700 (LWP 10903)): > #0 0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x7f33dfce2a3b in leveldb::(anonymous > namespace)::PosixEnv::BGThreadWrapper(void*) () from > /tmp/libleveldbjni-64-1-6922178968300745716.8 > #2 0x003d83407851 in start_thread () from /lib64/libpthread.so.0 > #3 0x003d830e811d in clone () from /lib64/libc.so.6 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated YARN-10495: - Description: In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto to container-executor, we meet a case that in our jenkins machine, we have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1.* We use a internal custom dynamic link library environment /usr/lib/x86_64-linux-gnu and we build hadoop with parameter as blow {code:java} -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu {code} Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where is libcrypto) {code:java} -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> libcrypto.so.1.0.0 -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so {code} Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is libcrypto) {code:java} -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> libcrypto.so.1.1 -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so {code} We build container-executor with The libcrypto.so 's version is not same case error when we start nodemanager {code:java} .. 3 more Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) ... 4 more Caused by: ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) ... 6 more {code} We should make RPATH of container-executor configurable to solve this problem was: In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto to container-executor, we meet a case that in our jenkins machine, we have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1* We use a internal custom dynamic link library environment /usr/lib/x86_64-linux-gnu and we build hadoop with parameter as blow {code:java} -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu {code} Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where is libcrypto) {code:java} -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> libcrypto.so.1.0.0 -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so {code} Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is libcrypto) {code:java} -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> libcrypto.so.1.1 -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 -rw-r--r-- 1 root root 27158
[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated YARN-10495: - Description: In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto to container-executor, we meet a case that in our jenkins machine, we have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1* We use a internal custom dynamic link library environment /usr/lib/x86_64-linux-gnu and we build hadoop with parameter as blow {code:java} -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu {code} Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where is libcrypto) {code:java} -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> libcrypto.so.1.0.0 -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so {code} Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is libcrypto) {code:java} -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> libcrypto.so.1.1 -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so {code} We build container-executor with The libcrypto.so 's version is not same case error when we start nodemanager {code:java} .. 3 more Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) ... 4 more Caused by: ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) ... 6 more {code} We should make RPATH of container-executor configurable to solve this problem was: In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto to container-executor, we meet a case that in our jenkins machine, we have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1* we build hadoop with {code:java} -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu {code} Under jenkins machine shared lib library pats /usr/lib/x86_64-linux-gun(where is libcrypto) {code:java} -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> libcrypto.so.1.0.0 -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so {code} Under nodemanager shared lib library pats /usr/lib/x86_64-linux-gun(where is libcrypto) {code:java} -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> libcrypto.so.1.1 -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> /lib/x86_64-
[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated YARN-10495: - Description: In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto to container-executor, we meet a case that in our jenkins machine, we have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1* we build hadoop with {code:java} -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu {code} Under jenkins machine shared lib library pats /usr/lib/x86_64-linux-gun(where is libcrypto) {code:java} -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> libcrypto.so.1.0.0 -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so {code} Under nodemanager shared lib library pats /usr/lib/x86_64-linux-gun(where is libcrypto) {code:java} -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> libcrypto.so.1.1 -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so {code} We build container-executor with The libcrypto.so 's version is not same case error when we start nodemanager {code:java} .. 3 more Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) ... 4 more Caused by: ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) ... 6 more {code} We should make RPATH of container-executor configurable to solve this problem was: In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto to container-executor, we meet a case that in our jenkins machine, we have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1* we build hadoop with {code:java} -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu {code} Under jenkins machine /usr/lib/x86_64-linux-gun {code:java} -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> libcrypto.so.1.0.0 -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so {code} Under nodemanager /usr/lib/x86_64-linux-gun {code:java} -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> libcrypto.so.1.1 -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so {code} The libcrypto.so 's version is not same case error when we start nodemanager {code:java} .. 3 more Cause
[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated YARN-10495: - Attachment: YARN-10495.001.patch > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Priority: Major > Attachments: YARN-10495.001.patch > > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1* > we build hadoop with > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine /usr/lib/x86_64-linux-gun > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager /usr/lib/x86_64-linux-gun > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 > lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so > {code} > > The libcrypto.so 's version is not same case error when we start nodemanager > > {code:java} > .. 3 more Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: > error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared > object file: No such file or directory at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) > ... 4 more Caused by: ExitCodeException exitCode=127: > /home/hadoop/hadoop/bin/container-executor: error while loading shared > libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file > or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) > ... 6 more > {code} > > We should make RPATH of container-executor configurable to solve this problem -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235311#comment-17235311 ] angerszhu commented on YARN-10495: -- ping [~ebadger] [~eyang] > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Priority: Major > Attachments: YARN-10495.001.patch > > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1* > we build hadoop with > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine /usr/lib/x86_64-linux-gun > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager /usr/lib/x86_64-linux-gun > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 > lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so > {code} > > The libcrypto.so 's version is not same case error when we start nodemanager > > {code:java} > .. 3 more Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: > error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared > object file: No such file or directory at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) > ... 4 more Caused by: ExitCodeException exitCode=127: > /home/hadoop/hadoop/bin/container-executor: error while loading shared > libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file > or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) > ... 6 more > {code} > > We should make RPATH of container-executor configurable to solve this problem -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10031) Create a general purpose log request with additional query parameters
[ https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235310#comment-17235310 ] Hadoop QA commented on YARN-10031: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 37s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} codespell {color} | {color:blue} 0m 3s{color} | | {color:blue} codespell was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | | {color:blue} Maven dependency ordering for branch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 24s{color} | [/branch-mvninstall-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-mvninstall-root.txt] | {color:red} root in trunk failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 24s{color} | [/branch-compile-root-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-compile-root-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt] | {color:red} root in trunk failed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 23s{color} | [/branch-compile-root-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-compile-root-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt] | {color:red} root in trunk failed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 20s{color} | [/buildtool-branch-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/buildtool-branch-checkstyle-root.txt] | {color:orange} The patch fails to run checkstyle in root {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s{color} | [/branch-mvnsite-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-mvnsite-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt] | {color:red} hadoop-mapreduce-client-hs in trunk failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 28s{color} | [/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt] | {color:red} hadoop-yarn-common in trunk failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 24s{color} | [/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt] | {color:red} hadoop-yarn-server-common in trunk failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 2m 2s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s{color} | [/branch-javadoc-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-javadoc-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt] | {color:red} hadoop-mapreduce-client-hs in trunk failed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s{color} | [/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/ar
[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated YARN-10495: - Description: In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto to container-executor, we meet a case that in our jenkins machine, we have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1* we build hadoop with {code:java} -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu {code} Under jenkins machine /usr/lib/x86_64-linux-gun {code:java} -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> libcrypto.so.1.0.0 -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so {code} Under nodemanager /usr/lib/x86_64-linux-gun {code:java} -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> libcrypto.so.1.1 -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> /lib/x86_64-linux-gnu/libcrypt.so.1 -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so {code} The libcrypto.so 's version is not same case error when we start nodemanager {code:java} .. 3 more Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) ... 4 more Caused by: ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) ... 6 more {code} We should make RPATH of container-executor configurable to solve this problem > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Priority: Major > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1* > we build hadoop with > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine /usr/lib/x86_64-linux-gun > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager /usr/lib/x86_64-linux-gun > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28
[jira] [Created] (YARN-10495) make the rpath of container-executor configurable
angerszhu created YARN-10495: Summary: make the rpath of container-executor configurable Key: YARN-10495 URL: https://issues.apache.org/jira/browse/YARN-10495 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10031) Create a general purpose log request with additional query parameters
[ https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235284#comment-17235284 ] Andras Gyori commented on YARN-10031: - Fixed the checkstyle errors and javadoc issues. > Create a general purpose log request with additional query parameters > - > > Key: YARN-10031 > URL: https://issues.apache.org/jira/browse/YARN-10031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Adam Antal >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10031-WIP.001.patch, YARN-10031.001.patch, > YARN-10031.002.patch, YARN-10031.003.patch, YARN-10031.004.patch, > YARN-10031.005.patch > > > The current endpoints are robust but not very flexible with regards to > filtering options. I suggest to add an endpoint which provides filtering > options. > E.g.: > In ATS we have multiple endpoints: > /containers/{containerid}/logs/{filename} > /containerlogs/{containerid}/{filename} > We could add @QueryParams parameters to the REST endpoints like this: > /containers/{containerid}/logs?fileName=stderr&containerState=FAILED&nodeId=nm45 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10031) Create a general purpose log request with additional query parameters
[ https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-10031: Attachment: YARN-10031.005.patch > Create a general purpose log request with additional query parameters > - > > Key: YARN-10031 > URL: https://issues.apache.org/jira/browse/YARN-10031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Adam Antal >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10031-WIP.001.patch, YARN-10031.001.patch, > YARN-10031.002.patch, YARN-10031.003.patch, YARN-10031.004.patch, > YARN-10031.005.patch > > > The current endpoints are robust but not very flexible with regards to > filtering options. I suggest to add an endpoint which provides filtering > options. > E.g.: > In ATS we have multiple endpoints: > /containers/{containerid}/logs/{filename} > /containerlogs/{containerid}/{filename} > We could add @QueryParams parameters to the REST endpoints like this: > /containers/{containerid}/logs?fileName=stderr&containerState=FAILED&nodeId=nm45 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org