Hello!

On Apr 30, 2015, at 6:02 PM, Scott Nolin wrote:

> Has anyone been working with the lustre jobstats feature and SLURM? We have 
> been, and it's OK. But now that I'm working on systems that run a lot of 
> array jobs and a fairly recent slurm version we found some ugly stuff.
> 
> Array jobs report their do SLURM_JOBID as a variable, and it's unique for 
> every job. But they use other IDs too that appear only for array jobs.
> 
> http://slurm.schedmd.com/job_array.html
> 
> However, that unique SLURM_JOBID as far as I can tell is only truly exposed 
> in command line tools via 'scontrol' - which is only valid while the job is 
> running. If you want to look at older jobs with sacct for example, things are 
> troublesome.
> 
> Here's what my coworker and I have figured out:
> 
> - You submit a (non-array) job that gets jobid 100.
> - The next job gets jobid 101.
> - Then submit a 10 task array job. That gets jobid 102. The sub tasks get 9 
> more job ids. If nothing else is happening with the system, that means you 
> use jobid 102 to 112.
> 
> If things were that orderly, you could cope with using SLURM_JOB_ID in lustre 
> jobstats pretty easily. Use sacct and you see job 102_2 - you know that is 
> jobid 103 in lustre jobstats.
> 
> But, if other jobs get submitted during set up (as of course they do), they 
> can take jobid 103. So, you've got problems.
> 
> I think we may try to set a magic variable in the slurm prolog and use that 
> for the jobstats_var, but who knows.

There's another method planned for doing jobid stuff, now mainly featured in 
kernel staging tree, but will make it's way to lustre tree too.

It's to just write your jobid directly into lustre from your prologue script 
(and clear from epilogue).

That way you can set it to whatever you like without ugly messings with shell 
variables (and equally ugly parsing of those variables from the kernel!).

For some reason I cannot find the corresponding master patch, though I have a 
passing memory of writing it, so this needs to be addressed separately.

Bye,
    Oleg
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to