Re: [slurm-users] Slurm commands for launching tasks: salloc and sbatch

Mike Mikailov Tue, 04 Jul 2023 07:15:14 -0700

Nodes for salloc could also be allowed to be oversubscribed or overloaded.

There are a number of tools that can be used to study task performance bottlenecks on HPC clusters. Some of these tools include:

SLURM Profiler: The SLURM Profiler is a tool that can be used to collect performance data for SLURM jobs. This data can be used to identify bottlenecks in the job execution process, such as slow nodes, I/O bottlenecks, and memory contention.
Ganglia: Ganglia is a monitoring system that can be used to collect performance data for HPC clusters. This data can be used to identify bottlenecks in the cluster, such as overloaded nodes, high network traffic, and slow storage devices.
PerfTools: PerfTools is a set of tools that can be used to collect performance data for Linux systems. This data can be used to identify bottlenecks in the system, such as CPU usage, memory usage, and I/O activity.
VTune Amplifier: VTune Amplifier is a tool from Intel that can be used to collect performance data for Intel processors. This data can be used to identify bottlenecks in the processor, such as cache misses, branch mispredictions, and memory latency.
HPCToolkit: HPCToolkit is a suite of tools that can be used to collect performance data for HPC applications. This data can be used to identify bottlenecks in the application, such as inefficient algorithms, memory leaks, and threading issues.

The best tool for a particular situation will depend on the specific needs of the user. However, all of the tools listed above can be used to identify task performance bottlenecks on HPC clusters.

In addition to these tools, there are a number of other things that can be done to study task performance bottlenecks on HPC clusters. These include:

Reviewing the job submission scripts: The job submission scripts can be reviewed to ensure that they are using the correct resources and that they are submitting the jobs to the correct nodes.
Monitoring the job execution: The job execution can be monitored to track the progress of the jobs and to identify any potential problems.
Analyzing the performance data: The performance data can be analyzed to identify the specific bottlenecks that are impacting the performance of the jobs.
Tuning the jobs: The jobs can be tuned to improve their performance. This may involve changing the parameters of the jobs, using different algorithms, or using different libraries.

By taking these steps, it is possible to identify and address task performance bottlenecks on HPC clusters. This can help to improve the performance of the jobs and to get the most out of the HPC resources.

Sent from my iPhone

On Jul 4, 2023, at 9:04 AM, Татьяна Озерова <tanyaozerova1...@gmail.com> wrote:

Thank you for your answer! And if slurm workers are identical, what can be the reason? Can interactive mode affect the performance? I have submitted the task with the help of "srun {{ name_of_task }} --pty bash", and the result is the same as for launching with salloc. Thanks in advance!
вт, 4 июл. 2023 г. в 15:51, Mike Mikailov <mmikai...@gmail.com>:
They should not affect the task performance.

May be the cluster configuration allocated slow machines for salloc.
salloc and sbatch have different purposes:
salloc is used to allocate a set of resources to a job. Once the resources have been allocated, the user can run a command or script on the allocated resources.
sbatch is used to submit a batch script to Slurm. The batch script contains a list of commands or scripts that will be executed on the allocated resources.
In general, salloc is used for jobs that need to be run interactively, such as jobs that require a shell or jobs that need to be debugged. sbatch is used for jobs that can be run in the background, such as long-running jobs or jobs that are submitted by a queuing system.
Here is a table that summarizes the key differences between salloc and sbatch:
Feature salloc sbatch
Purpose Allocate resources and run a command or script Submit a batch script
Interactive Yes No
Background No Yes
Queuing system No Yes
Here are some examples of how to use salloc and sbatch:
To allocate 2 nodes with 4 CPUs each and run the command ls, you would use the following command:
Code snippet
salloc -N 2 -c 4 ls
To submit a batch script called my_job.sh that contains the command python my_script.py, you would use the following command:
Code snippet
sbatch my_job.sh
For more information on salloc and sbatch, please see the following documentation:
salloc documentation: https://slurm.schedmd.com/salloc.html
sbatch documentation: https://slurm.schedmd.com/sbatch.html
Sent from my iPhone

On Jul 4, 2023, at 8:22 AM, Татьяна Озерова <tanyaozerova1...@gmail.com> wrote:

Hello! I have question about way of launching tasks in Slurm. I use the service in cloud and submit an application with sbatch or salloc. As far as I am concerned, the commands are similar: they allocate resources for counting users tasks and run them. However, I have received different results in cluster performance for the same task (task execution time is too long in case of salloc). So my question is what is the difference between these two commands, that can affect on task performance? Thank you beforehand.

Re: [slurm-users] Slurm commands for launching tasks: salloc and sbatch

Reply via email to