Dear All,
Thanks for the mailing list. Just joined the list
Like to introduce myself, My Name Joseph John work as system administrator.
Have been working on LINUX, but novice to HPC and slurm. Trying to learn
Thanks
Joseph John
Well again, I don't want to tweak things just to get the test to happen
quicker. I DO have to keep in mind the scheduler and backfill settings,
though. For instance, I think the default scheduler and backfill interval is
60 and 30 seconds...or vice versa. So, before I check the Scheduler valu
You can get some information on that from sdiag, and there are tweaks you can
make to backfill scheduling that affect how quickly it will get to a job.
That doesn’t really answer your real question, but might help you when you are
looking into this.
Sent from my iPhone
On Sep 29, 2023, at 16:1
I'm not looking for a one-time answer. We run these tests anytime we change
anything related to slurmversion, configuration, etc.We certainly run
the test after the system comes back up after an outage, and an hour would be a
long time to wait for that. That's certainly the brute-force
On Sep 29, 2023, at 2:51 PM, Davide DelVento
mailto:davide.quan...@gmail.com>> wrote:
I don't really have an answer for you other than a "hallway comment", that it
sounds like a good thing which I would test with a simulator, if I had one.
I've been intrigued by (but really not looked much into
I don't really have an answer for you other than a "hallway comment", that
it sounds like a good thing which I would test with a simulator, if I had
one. I've been intrigued by (but really not looked much into)
https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf
On Fri, Sep 29, 2023 at 10:05 A
FWIW:
The answer to the question "Can the make command utilize the resources
allocated to a Slurm job?" [1] is out of date. The patch mentioned is no
longer in the distribution, and personal inspection finds that it no longer
applies to newer versions of Gnu Make.
[1] https://slurm.schedmd.
On 29-09-2023 17:33, Ryan Novosielski wrote:
I’ll just say, we haven’t done an online/jobs running upgrade recently
(in part because we know our database upgrade will take a long time, and
we have some processes that rely on -M), but we have done it and it does
work fine. So the paranoia isn’t
On our system, for some partitions, we guarantee that a job can run at least an
hour before being preempted by a higher priority job. We use the QOS preempt
exempt time for this, and it appears to be working. But of course, I want to
TEST that it works.
So on a test system, I start a lower pr
My team lead brought that up also, that we could go ahead and change the
symlink that EVERYTHING uses, and nothing would happen...until the service is
restarted. That's good that it's not a timing-related change. Of course, we
do run the risk that a node will variously reboot on its own, and t
I’ll just say, we haven’t done an online/jobs running upgrade recently (in part
because we know our database upgrade will take a long time, and we have some
processes that rely on -M), but we have done it and it does work fine. So the
paranoia isn’t necessary unless you know that, like us, the D
Hi list
I have built a small cluster and have attached a few clients to it.
My clients can submit jobs so am confident that the service is setup
sufficiently.
What I would like to do is to deploy the slurm client into a docker container.
From within the docker container, I have setup munge and
This is one of the reasons we stick with using RPM's rather than the
symlink process. It's just cleaner and avoids the issue of having the
install on shared storage that may get overwhelmed with traffic or
suffer outages. Also the package manager automatically removes the
previous versions and
I did already see the upgrade section of Jason's talk, but it wasn't much about
the mechanics of the actual upgrade process, more of a big picture it seemed.
It dealt a lot with different parts of slurm at different versions, which is
something we don't have.
One little wrinkle here is that wh
Fantastic, this is really helpful, thanks!
On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon wrote:
> Yes it was later than that. If you are 23.02 you are good. We've been
> running with storing job_scripts on for years at this point and that part
> of the database only uses up 8.4G. Our entire data
15 matches
Mail list logo