Re: [gridengine users] Job in error states
Le 07/03/2020 à 15:57, MacMullan IV, Hugh a écrit : > Or if it’s an NFS share, perhaps it’s become unmounted on one or more exec > nodes. > > -Hugh > >> On Mar 7, 2020, at 10:55, Reuti wrote: >> >> Hi, >> >> is it alwys failing on one and the same node? Or are several nodes affected? >> One guess could be that the file system is full. >> >> -- Reuti >> >> >>> Am 05.03.2020 um 18:46 schrieb Jerome : >>> Dear Reuti, Mac Thank's for your answers. There is no filesystem full, nor an NFS mounted filesystem. The user cancel it's job, but i notice this on the accounting report: failed 27 : searching requested shell It's seems to be an error in the header of the script job, i supose. Regards -- -- Jérôme Quand un arbre tombe, on l'entend ; quand la forêt pousse, pas un bruit. (Proverbe africain) ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Job in error states
Or if it’s an NFS share, perhaps it’s become unmounted on one or more exec nodes. -Hugh > On Mar 7, 2020, at 10:55, Reuti wrote: > > Hi, > > is it alwys failing on one and the same node? Or are several nodes affected? > One guess could be that the file system is full. > > -- Reuti > > >> Am 05.03.2020 um 18:46 schrieb Jerome : >> >> Dear all >> >> I'm facing a strange error in SGE. One job is declared as in error, as i >> show in the following: >> >> >> == >> job_number: 1311910 >> exec_file: job_scripts/1311910 >> submission_time:Thu Mar 5 08:06:16 2020 >> owner: X >> >> ../.. >> >> error reason 1: 03/05/2020 11:11:56 [6021:55928]: >> execvlp(/opt/gridengine/default/spool/compute-0-0/job_scripts/1311910, >> "/opt/gridengine/default/spool/compute-0-0/job_scripts/1311910") failed: >> No such file or directory >> >> >> It's seems to be a problem during the copy of the script file on the >> node.. But, when i clear it, with qmod -cj, the job come back in error >> state? >> >> How could explain me what could explain this error? >> >> Thanks! > > ___ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Job in error states
Hi, is it alwys failing on one and the same node? Or are several nodes affected? One guess could be that the file system is full. -- Reuti > Am 05.03.2020 um 18:46 schrieb Jerome : > > Dear all > > I'm facing a strange error in SGE. One job is declared as in error, as i > show in the following: > > > == > job_number: 1311910 > exec_file: job_scripts/1311910 > submission_time:Thu Mar 5 08:06:16 2020 > owner: X > > ../.. > > error reason 1: 03/05/2020 11:11:56 [6021:55928]: > execvlp(/opt/gridengine/default/spool/compute-0-0/job_scripts/1311910, > "/opt/gridengine/default/spool/compute-0-0/job_scripts/1311910") failed: > No such file or directory > > > It's seems to be a problem during the copy of the script file on the > node.. But, when i clear it, with qmod -cj, the job come back in error > state? > > How could explain me what could explain this error? > > Thanks! ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users