Re: [slurm-users] Running an MPI job across two partitions

2020-03-24 Thread Chris Samuel

On 23/3/20 8:32 am, CB wrote:

I've looked at the heterogeneous job support but it creates two-separate 
jobs.


Yes, but the web page does say:

# By default, the applications launched by a single execution of
# the srun command (even for different components of the
# heterogeneous job) are combined into one MPI_COMM_WORLD with
# non-overlapping task IDs.

So it _should_ work.

I know there are issues with Cray systems & hetjobs at the moment, but I 
suspect that's not likely to concern you.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



[slurm-users] Slurm - Maridb error

2020-03-24 Thread Dhumal, Dr. Nilesh
Hello,
I am installing slurm on centos . I installed all supporting libraries 
successfully. I also installed MariDB before installing slurm.
I get the following error for sudo rpmbuild -ta slurm-20.02.0.tar.bz2

error: File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.0-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so


RPM build errors:
File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.0-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so
File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.0-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so

Can you tell how to resolve this issue.
Thanks,
Nilesh


Nilesh Dhumal

Assistant Professor of Chemistry
SH-430; Department of Chemistry and Physics
Florida Gulf Coast University
10501 FGCU Boulevard South
Fort Myers, FL 33965-6565
Phone: (239) 745-4394
Email: ndhu...@fgcu.edu

http://faculty.fgcu.edu/ndhumal/




Re: [slurm-users] Slurm Perl API use and examples

2020-03-24 Thread Burian, John
Thanks, Yair and Thomas. I’ll check out wrappers. My interest in this case is 
primarily in job submission and control. I was hoping that by using an API into 
Slurm, I would avoid problems I’ve had in the past, with interpreting 
inconsistent exit codes of command line executables, and parsing output, that 
may be mixed between stdout and stderr, to understand exactly what happened.

John


From: slurm-users  on behalf of Yair 
Yarom 
Reply-To: Slurm User Community List 
Date: Tuesday, March 24, 2020 at 6:05 AM
To: Slurm User Community List 
Subject: Re: [slurm-users] Slurm Perl API use and examples

[WARNING: External Email - Use Caution]


I also haven't got along with the Perl API shipped with slurm. I got it to 
work, but there were things missing.
Currently I have some wrapper functions for most of slurm commands, and a 
general parsing function to slurm's common outputs (of scontrol, sacctmgr, 
etc.).
Not in CPAN, but you can see it under in the cshuji::Slurm module in:
https://github.com/irush-cs/slurm-scripts/

I haven't checked it yet, but now with the slurm rest API, I think getting the 
information should be simpler.

HTH,
Yair.


On Mon, Mar 23, 2020 at 10:27 PM Thomas M. Payerle 
mailto:paye...@umd.edu>> wrote:
I was never able to figure out how to use the Perl API shipped with Slurm, but 
instead have written some wrappers around some of the Slurm commands for Perl.  
My wrappers for the sacctmgr and share commands are available at CPAN:
https://metacpan.org/release/Slurm-Sacctmgr
https://metacpan.org/release/Slurm-Sshare
(I have similar wrappers for a few other commands, but have not polished enough 
for CPAN release, but am willing to share if you contact me).

On Mon, Mar 23, 2020 at 3:49 PM Burian, John 
mailto:john.bur...@nationwidechildrens.org>>
 wrote:
I have some questions about the Slurm Perl API
- Is it still actively supported? I see it's still in the source in Git.
- Does anyone use it? If so, do you have a pointer to some example code?

My immediate question is, for methods that take a data structure as an input 
argument, how does one define that data structure? In Perl, it's just a hash, 
am I supposed to populate the keys of the hash by reading the matching C 
structure in slurm.h? Or do I only need to populate the keys that I care to 
provide a value for, and Slurm assigns defaults to the other keys/fields? 
Thanks,

--
John Burian
Senior Systems Programmer, Technical Lead
Institutional High Performance Computing
Abigail Wexner Research Institute, Nationwide Children’s Hospital



--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroadspaye...@umd.edu
5825 University Research Park   (301) 405-6135
University of Maryland
College Park, MD 20740-3831



Re: [slurm-users] Slurm Perl API use and examples

2020-03-24 Thread Marcus Wagner

In fact, we ARE using the perl API, but there are some flaws.

E.g. the array_task_str of the jobinfo structure. Slurm abbreviates long 
list of array indices, like scontrol does:


e.g.
1-3,5-8,45-...

yes, you can really find there three dots. In my opinion, this is ok for 
a general tool like scontrol to abbreviate the output, but it does not 
make any sense within a API.


Does anyone know, if the C-API also abbreviates things like that?


Best
Marcus

Am 23.03.2020 um 20:47 schrieb Burian, John:

I have some questions about the Slurm Perl API
- Is it still actively supported? I see it's still in the source in Git.
- Does anyone use it? If so, do you have a pointer to some example code?

My immediate question is, for methods that take a data structure as an input 
argument, how does one define that data structure? In Perl, it's just a hash, 
am I supposed to populate the keys of the hash by reading the matching C 
structure in slurm.h? Or do I only need to populate the keys that I care to 
provide a value for, and Slurm assigns defaults to the other keys/fields? 
Thanks,





Re: [slurm-users] Running an MPI job across two partitions

2020-03-24 Thread CB
Hi Michael,

Thanks for the comment.

I was just checking if there is any other way to do the job before
introducing another partition.
So it appears to me that creating a new partition is the way to go.

Thanks,
Chansup

On Mon, Mar 23, 2020 at 1:25 PM Renfro, Michael  wrote:

> Others might have more ideas, but anything I can think of would require a
> lot of manual steps to avoid mutual interference with jobs in the other
> partitions (allocating resources for a dummy job in the other partition,
> modifying the MPI host list to include nodes in the other partition, etc.).
>
> So why not make another partition encompassing both sets of nodes?
>
> > On Mar 23, 2020, at 10:58 AM, CB  wrote:
> >
> > Hi Andy,
> >
> > Yes, they are on teh same network fabric.
> >
> > Sure, creating another partition that encompass all of the nodes of the
> two or more partitions would solve the problem.
> > I am wondering if there are any other ways instead of creating a new
> partition?
> >
> > Thanks,
> > Chansup
> >
> >
> > On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy  wrote:
> > When you say “distinct compute nodes,” are they at least on the same
> network fabric?
> >
> >
> >
> > If so, the first thing I’d try would be to create a new partition that
> encompasses all of the nodes of the other two partitions.
> >
> >
> >
> > Andy
> >
> >
> >
> > From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On
> Behalf Of CB
> > Sent: Monday, March 23, 2020 11:32 AM
> > To: Slurm User Community List 
> > Subject: [slurm-users] Running an MPI job across two partitions
> >
> >
> >
> > Hi,
> >
> >
> >
> > I'm running Slurm 19.05 version.
> >
> >
> >
> > Is there any way to launch an MPI job on a group of distributed  nodes
> from two or more partitions, where each partition has distinct compute
> nodes?
> >
> >
> >
> > I've looked at the heterogeneous job support but it creates two-separate
> jobs.
> >
> >
> >
> > If there is no such capability with the current Slurm, I'd like to hear
> any recommendations or suggestions.
> >
> >
> >
> > Thanks,
> >
> > Chansup
> >
>
>


[slurm-users] Long delay between updates of sshare

2020-03-24 Thread Pascal Klink
Hi everyone,

We recently started to use a priority-based scheduling and after solving some 
final issues (see this post: 
https://groups.google.com/forum/m/#!topic/slurm-users/N8r8MoyjQAU), everything 
seems to be running quite smoothly now. However, we realized that the data 
shown by sshare, e.g.

Account UserRawShares   NormShares  RawUsage
EffectvUsageFairShare 

root0.00
8484544 1.00
 root   root1   0.50
0   0.001.00 
 iasteam1   0.50
8484544 1.00
  iasteam   carvalho1   0.25
1550368 0.1827290.40 
  iasteam   hany1   0.25
0   0.000.80 
  iasteam   pascal  1   0.25
6934176 0.8172710.20 
  iasteam   stark   1   0.25
0   0.000.80 

is only updated in very long intervals. This means that the current RawUsage of 
e.g. user ‚pascal‘ stays a very long time on 6934176, and then jumps to the 
next value, say 7238923, where it then again waits a long time until it is 
updated. Different from this behavior, the data shown by sacct is updated every 
second.

We already tried reducing the update interval of sshare by adjusting the 
JobAcctGatherFrequency, but this did not help in our case. Also my attempts to 
look for similar questions had no success. Can anybody help us out here and 
point us to the correct option that we need to change to get everything running 
smoothly?

P.S.: Our config is the same as in the post that I linked (except for the 
proposed fix in the corresponding thread obviously).

Best,
Pascal




Re: [slurm-users] Slurm Perl API use and examples

2020-03-24 Thread Yair Yarom
I also haven't got along with the Perl API shipped with slurm. I got it to
work, but there were things missing.
Currently I have some wrapper functions for most of slurm commands, and a
general parsing function to slurm's common outputs (of scontrol, sacctmgr,
etc.).
Not in CPAN, but you can see it under in the cshuji::Slurm module in:
https://github.com/irush-cs/slurm-scripts/

I haven't checked it yet, but now with the slurm rest API, I think getting
the information should be simpler.

HTH,
Yair.


On Mon, Mar 23, 2020 at 10:27 PM Thomas M. Payerle  wrote:

> I was never able to figure out how to use the Perl API shipped with Slurm,
> but instead have written some wrappers around some of the Slurm commands
> for Perl.  My wrappers for the sacctmgr and share commands are available at
> CPAN:
> https://metacpan.org/release/Slurm-Sacctmgr
> https://metacpan.org/release/Slurm-Sshare
> (I have similar wrappers for a few other commands, but have not polished
> enough for CPAN release, but am willing to share if you contact me).
>
> On Mon, Mar 23, 2020 at 3:49 PM Burian, John <
> john.bur...@nationwidechildrens.org> wrote:
>
>> I have some questions about the Slurm Perl API
>> - Is it still actively supported? I see it's still in the source in Git.
>> - Does anyone use it? If so, do you have a pointer to some example code?
>>
>> My immediate question is, for methods that take a data structure as an
>> input argument, how does one define that data structure? In Perl, it's just
>> a hash, am I supposed to populate the keys of the hash by reading the
>> matching C structure in slurm.h? Or do I only need to populate the keys
>> that I care to provide a value for, and Slurm assigns defaults to the other
>> keys/fields? Thanks,
>>
>> --
>> John Burian
>> Senior Systems Programmer, Technical Lead
>> Institutional High Performance Computing
>> Abigail Wexner Research Institute, Nationwide Children’s Hospital
>>
>>
>>
>
> --
> Tom Payerle
> DIT-ACIGS/Mid-Atlantic Crossroadspaye...@umd.edu
> 5825 University Research Park   (301) 405-6135
> University of Maryland
> College Park, MD 20740-3831
>


Re: [slurm-users] Accounting Information from slurmdbd does not reach slurmctld

2020-03-24 Thread Pascal Klink
Hi Sean, Hi Marcus,

Changing from localhost to the actual IP seems to have solved the problem. Is 
that because not only the slurmctld process on the control node but also the 
slurmd processes on the compute nodes need to have access to the accounting 
information?

Because although slurmdbd and slurmctld are running on the same computer, 
obviously the compute nodes are on different machines.

Anyways, thank you very much for the help. We still experience some delays 
until the information in sshare is updated with the data shown by sacct, but I 
will create a different thread for that.

Best 
Pascal