Also when we directly use container-executor command to put something into devices.deny, it report unexpected operation code.
test@ip:/opt/hadoop-3.3.0$ sudo -U yarn /opt/hadoop-3.3.0/bin/container-executor --module-gpu --container_id container_e57_1667177358230_0650_01_000001 -excluded_gpus 1,2,3,4,5,6,7 [sudo〕 password for alpha: CGroups: Updating cgroups, path=/sys/fs/cgroup/devices/yarn/container_e57_1667177358230_0650_01_000001/devices.deny, value=c 195:1 rwm CGroups: Updating cgroups, path=/sys/fs/cgroup/devices/yarn/container_e57_1667177358230_0650_01_000001/devices.deny, value=c 195:2 rwm CGroups: Updating cgroups, path=/ sys/fs/cgroup/devices/yarn/container_e57_1667177358230 0650 01 000001/devices.deny, value=c 195:3 rwm CGroups: Updating cgroups, path=/sys/fs/cgroup/devices/yarn/container_e57_1667177358230_0650_01_000001/devices.deny, value=c 195:4 rwm CGroups: Updating cgroups, path=/sys/ fs/cgroup/devices/yarn/container_e57_1667177358230_0650_01_000001/devices.deny, value=c 195:5 rwm CGroups: Updating cgroups, path=/sys/fs/cgroup/ devices/yarn/container_e57_1667177358230_0650_01_000001/devices.deny, value=c 195:6 rwm CGroups: Dpaatang SEroupo: Pathg/Bya/4S/Eroup/ aeVicas/arn/ ontatner-es/ 18871773382S8 68s8 f ooooot /aevAces.a8y. value=c 195:7 rwm Unexpected operation code: -1 Nonzero exit code=3, error message=' Invalid command provided’ Thanks, Xiong > 2022年10月31日 22:21,zxcs <zhuxion...@163.com> 写道: > > Hi, experts, > > we are using hadoop-3.3.0 and trying using cpu also enable gpu isolation > following guide > https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/UsingGpus.html > > <https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/UsingGpus.html> > > but when we start a yarn job, node manager always failed at unexpected > operation code:-1 , could any experts shed some light here? Thanks in > advance! > > (sorry for the picture due, this due to we banned the copy anything from > testbed to outside) > > <粘贴的图形-4.tiff> > > > > here is the yarn-site.xml config > <property> > <name>yarn.resource-types< /name> > <value>yarn.io/gpu <http://yarn.io/gpu>< /value> > </property> > <property> > <name>yarn.nodemanager.resource-plugins</name> > <value>yarn.io/gpu <http://yarn.io/gpu></value> > </ property> > > and below is obtainer-executor.cfg > yarn.nodemanager.linux-container-executor.group=hadoop > banned.users=root > min.user.id <http://min.user.id/>=500 > allowed.system.users=yarn > [gpu] > module.enabled=true > [cgroups] > root=/sys/fs/cgroup > yarn-hierarchy=yarn > > below is the directory of /sys/fs/cgroup > <粘贴的图形-3.tiff> >