Re: [gridengine users] Job in error states

2020-03-09 Thread Jerome
Le 07/03/2020 à 15:57, MacMullan IV, Hugh a écrit :
> Or if it’s an NFS share, perhaps it’s become unmounted on one or more exec 
> nodes.
> 
> -Hugh
> 
>> On Mar 7, 2020, at 10:55, Reuti  wrote:
>>
>> Hi,
>>
>> is it alwys failing on one and the same node? Or are several nodes affected? 
>> One guess could be that the file system is full.
>>
>> -- Reuti
>>
>>
>>> Am 05.03.2020 um 18:46 schrieb Jerome :
>>>

Dear Reuti, Mac

Thank's for your answers. There is no filesystem full, nor an NFS
mounted filesystem.
The user cancel it's job, but i notice this on the accounting report:

failed   27  : searching requested shell

It's seems to be an error in the header of the script job, i supose.

Regards

-- 
-- Jérôme
Quand un arbre tombe, on l'entend ; quand la forêt pousse, pas un bruit.
(Proverbe africain)
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Job in error states

2020-03-08 Thread MacMullan IV, Hugh
Or if it’s an NFS share, perhaps it’s become unmounted on one or more exec 
nodes.

-Hugh

> On Mar 7, 2020, at 10:55, Reuti  wrote:
> 
> Hi,
> 
> is it alwys failing on one and the same node? Or are several nodes affected? 
> One guess could be that the file system is full.
> 
> -- Reuti
> 
> 
>> Am 05.03.2020 um 18:46 schrieb Jerome :
>> 
>> Dear all
>> 
>> I'm facing a strange error in SGE. One job is declared as in error, as i
>> show in the following:
>> 
>> 
>> ==
>> job_number: 1311910
>> exec_file:  job_scripts/1311910
>> submission_time:Thu Mar  5 08:06:16 2020
>> owner:  X
>> 
>> ../..
>> 
>> error reason  1:  03/05/2020 11:11:56 [6021:55928]:
>> execvlp(/opt/gridengine/default/spool/compute-0-0/job_scripts/1311910,
>> "/opt/gridengine/default/spool/compute-0-0/job_scripts/1311910") failed:
>> No such file or directory
>> 
>> 
>> It's seems to be a problem during the copy of the script file on the
>> node.. But, when i clear it, with qmod -cj, the job  come back in error
>> state?
>> 
>> How could explain me what could explain this error?
>> 
>> Thanks!
> 
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Job in error states

2020-03-07 Thread Reuti
Hi,

is it alwys failing on one and the same node? Or are several nodes affected? 
One guess could be that the file system is full.

-- Reuti


> Am 05.03.2020 um 18:46 schrieb Jerome :
> 
> Dear all
> 
> I'm facing a strange error in SGE. One job is declared as in error, as i
> show in the following:
> 
> 
> ==
> job_number: 1311910
> exec_file:  job_scripts/1311910
> submission_time:Thu Mar  5 08:06:16 2020
> owner:  X
> 
> ../..
> 
> error reason  1:  03/05/2020 11:11:56 [6021:55928]:
> execvlp(/opt/gridengine/default/spool/compute-0-0/job_scripts/1311910,
> "/opt/gridengine/default/spool/compute-0-0/job_scripts/1311910") failed:
> No such file or directory
> 
> 
> It's seems to be a problem during the copy of the script file on the
> node.. But, when i clear it, with qmod -cj, the job  come back in error
> state?
> 
> How could explain me what could explain this error?
> 
> Thanks!

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users