Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
I added that line and restarted the service via

# systemctl restart slurmctld

However, still I get the same error.
Moreover, when I salloc, I don't see slurm/ in cgroup path

[shams@hpc ~]$ salloc
salloc: Granted job allocation 293
[shams@hpc ~]$ bin/show_my_cgroup --debug
bash: bin/show_my_cgroup: No such file or directory
[shams@hpc ~]$ cd /sys/fs/cgroup/memory/
machine.slice/ system.slice/  user.slice/


Regards,
Mahmood




On Sat, Jan 25, 2020 at 12:20 AM Mark Hahn  wrote:

> why not just try it?
> you already know that many, many sites use Slurm happily,
> and it's not as if blast is something exotic and newfangled.
>


Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
>depends on whether "ConstrainSwapSpace=yes" appears in cgroup.conf.

Thanks for the detail.
On the head node, mine is

# cat cgroup.conf
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=no
ConstrainRAMSpace=no


Is that the root of the problem?

Regards,
Mahmood


Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mark Hahn



apologies for a long response; didn't have time for a shorter one ;)


>you have it backwards.  slurm creates a cgroup for the job (step)

and uses the cgroup control to tell the kernel how much memory to
permit the job-step to use.


I would like to know how can I increase the threshold in slurm config
files. I can not find it.


maybe I'm not being clear.  if you enable cgroups in slurm.conf,
for the simple case of a single node, single-step job,
slurm creates a cgroup tree for the job (also identified by uid),
and the process it starts on your behalf (running the job script)
is controlled by the *.limit_in_bytes settings:

[hahn@gra-login3 ~]$ salloc
...
salloc: Granted job allocation 26782102
[hahn@gra796 ~]$ bin/show_my_cgroup --debug
...
gra796:14630: DEBUG pid=14630 find_cgroup(14630,memory) =
/sys/fs/cgroup/memory/slurm/uid_3000566/job_26782102/step_0
gra796:14630: memory.usage_in_bytes=8822784
gra796:14630: memory.limit_in_bytes=268435456
gra796:14630: memory.memsw.usage_in_bytes=8822784
gra796:14630: memory.memsw.limit_in_bytes=268435456
...
[hahn@gra796 ~]$ cd /sys/fs/cgroup/memory/slurm/uid_3000566/job_26782102/step_0
[hahn@gra796 step_0]$ ls -l
...
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.max_usage_in_bytes
...
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.memsw.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.memsw.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan 24 14:43 memory.memsw.usage_in_bytes
...
[hahn@gra796 step_0]$ cat memory.memsw.limit_in_bytes 
268435456
[hahn@gra796 step_0]$ cat memory.limit_in_bytes 
268435456
[hahn@gra796 step_0]$ cat memory.kmem.limit_in_bytes 
9223372036854771712


in other words, on this system, such a job defaults to 256M (because I didn't 
salloc with --mem) and the cgroup that controls the job step's processes 
(in this case, step_0) are found in a specific sub-cgroup.


if I, as root, came in while that step was executing, and wrote some different
number to one of the limits, I could expand or contract the limit that the
kernel permits the cgroup.  for instance, I could leave rss limited but allow 
the cgroup more VM just by expanding the memsw.limit_in_bytes.


of course, this is a bad idea: the reason you have Slurm around is to make
the right settings in the first place!  you get to choose whether the
RSS limit (memory.limit_in_bytes) is the same as the VSZ limit 
(memory.memsw.limit_in_bytes).


that distinction seems to be your whole issue.  mmaping a file increases VSZ,
but not necessarily RSS, and VSZ can easily and safely go vastly beyond 
physical memory since you're using mmap to read a large file.



According to [1], " No value is provided by cgroups for virtual memory size
('vsize') "

[1] https://slurm.schedmd.com/slurm.conf.html


depends on whether "ConstrainSwapSpace=yes" appears in cgroup.conf.
(it's yes on the system above)

regards,
--
Mark Hahn | SHARCnet Sysadmin | h...@sharcnet.ca | http://www.sharcnet.ca
  | McMaster RHPCS| h...@mcmaster.ca | 905 525 9140 x24687
  | Compute/Calcul Canada| http://www.computecanada.ca



Re: [slurm-users] Question about slurm source code and libraries

2020-01-24 Thread Dean Schulze
That's a different project.  I'm asking if there is a REST client library
for making REST calls in the slurm source code
https://github.com/SchedMD/slurm.

On Fri, Jan 24, 2020 at 12:35 PM Renfro, Michael  wrote:

> The slurm-web project [1] has a REST API [2]. Never used it myself, just
> used the regular web frontend for viewing queue and node state.
>
> [1] https://edf-hpc.github.io/slurm-web/index.html
> [2] https://edf-hpc.github.io/slurm-web/api.html
>
> > On Jan 24, 2020, at 1:22 PM, Dean Schulze 
> wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > Since there isn't a list for slurm development I'll ask here.  Does the
> slurm code include a library for making REST calls?  I'm writing a plugin
> that will make REST calls and if slurm already has one I'll use that,
> otherwise I'll find one with an appropriate open source license for my
> plugin.
> >
> > Thanks.
>
>
>


Re: [slurm-users] Question about slurm source code and libraries

2020-01-24 Thread Renfro, Michael
The slurm-web project [1] has a REST API [2]. Never used it myself, just used 
the regular web frontend for viewing queue and node state.

[1] https://edf-hpc.github.io/slurm-web/index.html
[2] https://edf-hpc.github.io/slurm-web/api.html

> On Jan 24, 2020, at 1:22 PM, Dean Schulze  wrote:
> 
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> Since there isn't a list for slurm development I'll ask here.  Does the slurm 
> code include a library for making REST calls?  I'm writing a plugin that will 
> make REST calls and if slurm already has one I'll use that, otherwise I'll 
> find one with an appropriate open source license for my plugin.
> 
> Thanks.




[slurm-users] Question about slurm source code and libraries

2020-01-24 Thread Dean Schulze
Since there isn't a list for slurm development I'll ask here.  Does the
slurm code include a library for making REST calls?  I'm writing a plugin
that will make REST calls and if slurm already has one I'll use that,
otherwise I'll find one with an appropriate open source license for my
plugin.

Thanks.


Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
 >you have it backwards.  slurm creates a cgroup for the job (step)
>and uses the cgroup control to tell the kernel how much memory to
>permit the job-step to use.


I would like to know how can I increase the threshold in slurm config
files. I can not find it.
According to [1], " No value is provided by cgroups for virtual memory size
('vsize') "

[1] https://slurm.schedmd.com/slurm.conf.html

Regards,
Mahmood


Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
>how much memory are you requesting from Slurm in your job?
#SBATCH --mem=38GB

also,

# sacctmgr list association format=user,grptres%30 | grep shams
 shams cpu=10,mem=40G


Regards,
Mahmood


Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mark Hahn

Excuse me, I have confused with that.
While the cgroup value is 68GB, I run on terminal and see the VSZ is about
80GB and the program runs normally.
However, with slurm on that node, I can not run.


how much memory are you requesting from Slurm in your job?


Why on terminal I can run, but I can not run via slurm?


the purpose of slurm is to allocate resources.  logging into a node "bare"
is "evading" everything slurm does.


I wonder if slurm gets the right value from kernel's cgroup.


you have it backwards.  slurm creates a cgroup for the job (step)
and uses the cgroup control to tell the kernel how much memory to 
permit the job-step to use.



I would like to locally solve the problem for blast and I am not seeking a
system wide solution right now.


there's nothing unique about your system or blast (which is extremely common
on many large slurm installs).

regards, mark hahn



Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
Excuse me, I have confused with that.
While the cgroup value is 68GB, I run on terminal and see the VSZ is about
80GB and the program runs normally.
However, with slurm on that node, I can not run.

Why on terminal I can run, but I can not run via slurm?
I wonder if slurm gets the right value from kernel's cgroup.


I would like to locally solve the problem for blast and I am not seeking a
system wide solution right now.

Regards,
Mahmood




On Fri, Jan 24, 2020 at 8:45 PM Mark Hahn  wrote:

>
> of course not.  "usage_in_bytes" is an output parameter.
> your issue is that Slurm is setting at least some of the input
> controls such as memory.memsw.limit_in_bytes.  if you want to fight
> with Slurm, you could set the memory.memsw.limit_in_bytes value on
> a "live" cgroup.  (note also that above you're referring to the base
> cgroup, not the cgroup for your job.)  of course, manually fighting
> Slurm is a Fairly Bad Idea.
>
> you should read the documentation on cgroups to understand how these work.
> memsw basically corresponds to VSZ in ps, whereas mem corresponds with RSS.
>
> regards, mark hahn.
>
>


Re: [slurm-users] Srun not setting DISPLAY with --x11 for one account

2020-01-24 Thread William Brown
There are differences for X11 between Slurm versions so it may help to know
which version you have.

I tried some of your commands on our slurm 19.05.3-2 cluster, and
interestingly on the session on the compute node I don't see the cookie for
the login node:  This was with MobaXterm:

[user@prdubrvm005 ~]$ xauth list
prdubrvm005.research.rcsi.com/unix:10  MIT-MAGIC-COOKIE-1
 2efc5dd851736e3848193f65d038eca8
[user@prdubrvm005 ~]$ srun --pty  --x11  --preserve-env /bin/bash
[user@prdubrhpc1-02 ~]$ xauth list
prdubrhpc1-02.research.rcsi.com/unix:95  MIT-MAGIC-COOKIE-1
 2efc5dd851736e3848193f65d038eca8
[user@prdubrhpc1-02 ~]$ echo $DISPLAY
localhost:95.0

Any per-user problem would make me suspect the user having a different
shell, or something in their login script.  Can you make their .bashrc and
.bash_profile just exit?  Or look for hidden configuration files for
 in their home directory?

William



On Fri, 24 Jan 2020 at 16:05, Simon Andrews 
wrote:

> I have a weird problem which I can’t get to the bottom of.
>
>
>
> We have a cluster which allows users to start interactive sessions which
> forward any X11 sessions they generated on the head node.  This generally
> works fine, but on the account of one user it doesn’t work.  The X11
> connection to the head node is fine, but it won’t transfer to the compute
> node.
>
>
>
> The symptoms are shown below:
>
>
>
> A good user gets this:
>
>
>
> [good@headnode ~]$ xauth list
>
> headnode.babraham.ac.uk/unix:12  MIT-MAGIC-COOKIE-1
> f04a2bf9a921a3357e44373655add14a
>
>
>
> [good@headnode ~]$ echo $DISPLAY
>
> localhost:12.0
>
>
>
> [good@headnode ~]$ srun --pty -p interactive --x11  --preserve-env
> /bin/bash
>
>
>
> [good@compute ~]$ xauth list
>
> headnode.babraham.ac.uk/unix:12  MIT-MAGIC-COOKIE-1
> f04a2bf9a921a3357e44373655add14a
>
> compute/unix:25  MIT-MAGIC-COOKIE-1  f04a2bf9a921a3357e44373655add14a
>
>
>
> [good@compute ~]$ echo $DISPLAY
>
> localhost:25.0
>
>
>
> So the cookie is copied from the head node and forwarded and the DISPLAY
> variable is updated.
>
>
>
> The bad user gets this:
>
>
>
> [bad@headnode ~]$ xauth list
>
> headnode.babraham.ac.uk/unix:10  MIT-MAGIC-COOKIE-1
> c39a493a37132d308b37469d363d8692
>
>
>
> [bad@headnode ~]$ echo $DISPLAY
>
> localhost:10.0
>
>
>
> [bad@headnode ~]$ srun --pty -p interactive --x11  --preserve-env
> /bin/bash
>
>
>
> [bad@compute ~]$ xauth list
>
> headnode.babraham.ac.uk/unix:10  MIT-MAGIC-COOKIE-1
> c39a493a37132d308b37469d363d8692
>
>
>
> [bad@compute ~]$ echo $DISPLAY
>
> localhost:10.0
>
>
>
> So the cookie isn’t copied and the DISPLAY isn’t updated.  I can’t see any
> errors in the logs and I can’t see anything different about this account.
>
>
>
> If I do a straight forward ssh -Y from the head node to a compute node
> from the bad account then that works fine – it’s only whatever is specific
> about the way that srun forwards X which fails.
>
>
>
> Any ideas or suggestions for debugging would be appreciated as I’m running
> out of things to try!
>
>
>
> Simon.
>
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT 
> *Registered
> Charity No. 1053902.*
>
> The information transmitted in this email is directed only to the
> addressee. If you received this in error, please contact the sender and
> delete this email from your system. The contents of this e-mail are the
> views of the sender and do not necessarily represent the views of the
> Babraham Institute. Full conditions at: www.babraham.ac.uk
> 
>


Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mark Hahn

I see this

# cat /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes
71496372224

which is about 68GB.
As I said, running from terminal has no problem.
Is is just fine to set a larger value (130GB) as below?

echo 139586437120 > /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes


of course not.  "usage_in_bytes" is an output parameter.
your issue is that Slurm is setting at least some of the input
controls such as memory.memsw.limit_in_bytes.  if you want to fight
with Slurm, you could set the memory.memsw.limit_in_bytes value on 
a "live" cgroup.  (note also that above you're referring to the base

cgroup, not the cgroup for your job.)  of course, manually fighting
Slurm is a Fairly Bad Idea.

you should read the documentation on cgroups to understand how these work.
memsw basically corresponds to VSZ in ps, whereas mem corresponds with RSS.

regards, mark hahn.



Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
I see this

# cat /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes
71496372224

which is about 68GB.
As I said, running from terminal has no problem.
Is is just fine to set a larger value (130GB) as below?

echo 139586437120 > /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes


Regards,
Mahmood




On Fri, Jan 24, 2020 at 7:35 PM Mahmood Naderan 
wrote:

> Yes, it uses a large value for virtual size.
> Since I can run it via terminal (outside of slurm), I think kernel
> parameters are OK.
> In other words, I have to configure slurm for that purpose.
> Which slurm configuration parameter is in charge of that?
>
> Regards,
> Mahmood
>
>


Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
Yes, it uses a large value for virtual size.
Since I can run it via terminal (outside of slurm), I think kernel
parameters are OK.
In other words, I have to configure slurm for that purpose.
Which slurm configuration parameter is in charge of that?

Regards,
Mahmood




On Fri, Jan 24, 2020 at 5:22 PM Jeffrey T Frey  wrote:

> Does your Slurm cgroup or node OS cgroup configuration limit the virtual
> address space of processes?  The "Error memory mapping" is thrown by blast
> when trying to create a virtual address space that exposes the contents of
> a file on disk (see "man mmap") so the file can be accessed via pointers
> (with the OS handling paging data in and out of the file on disk) rather
> than by means of standard file i/o calls (e.g. fread(), fscanf(), read()).
> It sounds like you don't have enough system RAM, period, or the cgroup
> "memory.memsw.limit_in_bytes" is set too low for the amount of file content
> you're attempting to mmap() into the virtual address space (e.g. BIG files).
>
>
>
>


[slurm-users] Srun not setting DISPLAY with --x11 for one account

2020-01-24 Thread Simon Andrews
I have a weird problem which I can't get to the bottom of.

We have a cluster which allows users to start interactive sessions which 
forward any X11 sessions they generated on the head node.  This generally works 
fine, but on the account of one user it doesn't work.  The X11 connection to 
the head node is fine, but it won't transfer to the compute node.

The symptoms are shown below:

A good user gets this:

[good@headnode ~]$ xauth list
headnode.babraham.ac.uk/unix:12  MIT-MAGIC-COOKIE-1  
f04a2bf9a921a3357e44373655add14a

[good@headnode ~]$ echo $DISPLAY
localhost:12.0

[good@headnode ~]$ srun --pty -p interactive --x11  --preserve-env /bin/bash

[good@compute ~]$ xauth list
headnode.babraham.ac.uk/unix:12  MIT-MAGIC-COOKIE-1  
f04a2bf9a921a3357e44373655add14a
compute/unix:25  MIT-MAGIC-COOKIE-1  f04a2bf9a921a3357e44373655add14a

[good@compute ~]$ echo $DISPLAY
localhost:25.0

So the cookie is copied from the head node and forwarded and the DISPLAY 
variable is updated.

The bad user gets this:

[bad@headnode ~]$ xauth list
headnode.babraham.ac.uk/unix:10  MIT-MAGIC-COOKIE-1  
c39a493a37132d308b37469d363d8692

[bad@headnode ~]$ echo $DISPLAY
localhost:10.0

[bad@headnode ~]$ srun --pty -p interactive --x11  --preserve-env /bin/bash

[bad@compute ~]$ xauth list
headnode.babraham.ac.uk/unix:10  MIT-MAGIC-COOKIE-1  
c39a493a37132d308b37469d363d8692

[bad@compute ~]$ echo $DISPLAY
localhost:10.0

So the cookie isn't copied and the DISPLAY isn't updated.  I can't see any 
errors in the logs and I can't see anything different about this account.

If I do a straight forward ssh -Y from the head node to a compute node from the 
bad account then that works fine - it's only whatever is specific about the way 
that srun forwards X which fails.

Any ideas or suggestions for debugging would be appreciated as I'm running out 
of things to try!

Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.uk


Re: [slurm-users] Multinode blast run

2020-01-24 Thread Chris Samuel

On 24/1/20 3:46 am, Mahmood Naderan wrote:


Has anyone run blast on multiple nodes via slurm?


I don't think blast is something that can run across nodes (or at least 
it didn't used to be).  There is/was something called "mpiblast" that 
could do that.


If you'll excuse the plug this sounds like a good question for the 
Beowulf list https://www.beowulf.org/ which is a more general purpose 
cluster computing list (disclaimer: I'm the caretaker of it these days).


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] job_submit.lua and memory allocations

2020-01-24 Thread William G. Wichser
Answering my own question after much help from Josko Plazonic-

local mem_to_use = 0

-- This is the logic -
-- either min_mem_per_node or min_mem_per_cpu will be set
-- Both can't be set, so only act in those two cases

if job_desc.min_mem_per_node ~= nil then
 mem_to_use = job_desc.min_mem_per_node
end
if job_desc.min_mem_per_cpu ~= nil then
 mem_to_use = job_desc.min_mem_per_cpu * job_desc.min_cpus
end

log_info("slurm_job_submit: Got total memory: %d", mem_to_use)



Bill


On 1/24/20 8:52 AM, William G. Wichser wrote:
> Resurrecting an older thread where I need to obtain the value for memory
> in a submitted job.  Turns out this is not an easy case with the method
> I'm trying to use so hope that there is just some variable I am overlooking.
> 
> The trivial case was simply to look at job_desc.pn_min_memory.  And this
> works fine as long as jobs are submitted with a --mem= flag.  But there
> are two other ways that jobs get submitted which make this value
> something like 2^63.
> 
> The first is when no memory is specified and users rely on the default.
> The second is with --mem-per-cpu=X
> 
> 
> For that second case I can detect using
> (job_desc.pn_min_memory - slurm.MEM_PER_CPU) * job_desc.min_cpus
> 
> But I find that when users are using the default memory allocation, it
> isn't so easy to detect since it appears that both of the memory values
> are set to 2^63 or close to that number.  Maybe it's 2^64 -1.  Whatever.
> 
> I just feel that there has to be a better way!  Is there soemthing that
> I'm missing?  Perhaps a tres.memory or something which has the right
> value when in job_submit.lua?
> 
> Thanks,
> Bill
> 


[slurm-users] job_submit.lua and memory allocations

2020-01-24 Thread William G. Wichser
Resurrecting an older thread where I need to obtain the value for memory 
in a submitted job.  Turns out this is not an easy case with the method 
I'm trying to use so hope that there is just some variable I am overlooking.

The trivial case was simply to look at job_desc.pn_min_memory.  And this 
works fine as long as jobs are submitted with a --mem= flag.  But there 
are two other ways that jobs get submitted which make this value 
something like 2^63.

The first is when no memory is specified and users rely on the default. 
The second is with --mem-per-cpu=X


For that second case I can detect using
(job_desc.pn_min_memory - slurm.MEM_PER_CPU) * job_desc.min_cpus

But I find that when users are using the default memory allocation, it 
isn't so easy to detect since it appears that both of the memory values 
are set to 2^63 or close to that number.  Maybe it's 2^64 -1.  Whatever.

I just feel that there has to be a better way!  Is there soemthing that 
I'm missing?  Perhaps a tres.memory or something which has the right 
value when in job_submit.lua?

Thanks,
Bill


Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Jeffrey T Frey
Does your Slurm cgroup or node OS cgroup configuration limit the virtual 
address space of processes?  The "Error memory mapping" is thrown by blast when 
trying to create a virtual address space that exposes the contents of a file on 
disk (see "man mmap") so the file can be accessed via pointers (with the OS 
handling paging data in and out of the file on disk) rather than by means of 
standard file i/o calls (e.g. fread(), fscanf(), read()).  It sounds like you 
don't have enough system RAM, period, or the cgroup 
"memory.memsw.limit_in_bytes" is set too low for the amount of file content 
you're attempting to mmap() into the virtual address space (e.g. BIG files).




> On Jan 24, 2020, at 07:03 , Mahmood Naderan  wrote:
> 
> Hi,
> Although I can run the blastx command on terminal on all nodes, I can not use 
> slurm for that due to a so called "memory map error".
> Please see below that I pressed ^C after some seconds when running via 
> terminal.
> 
> Fri Jan 24 15:29:57 +0330 2020
> [shams@hpc ~]$ blastx -db ~/ncbi-blast-2.9.0+/bin/nr -query 
> ~/khTrinityfilterless1.fasta -max_target_seqs 5 -outfmt 6 -evalue 1e-5 
> -num_threads 2
> ^C
> [shams@hpc ~]$ date
> Fri Jan 24 15:30:09 +0330 2020
> 
> 
> However, the following script fails
> 
> [shams@hpc ~]$ cat slurm_blast.sh
> #!/bin/bash
> #SBATCH --job-name=blast1
> #SBATCH --output=my_blast.log
> #SBATCH --partition=SEA
> #SBATCH --account=fish
> #SBATCH --mem=38GB
> #SBATCH --nodelist=hpc
> #SBATCH --nodes=1
> #SBATCH --ntasks-per-node=2
> 
> export PATH=~/ncbi-blast-2.9.0+/bin:$PATH
> blastx -db ~/ncbi-blast-2.9.0+/bin/nr -query ~/khTrinityfilterless1.fasta 
> -max_target_seqs 5 -outfmt 6 -evalue 1e-5 -num_threads 2
> [shams@hpc ~]$ sbatch slurm_blast.sh
> Submitted batch job 284
> [shams@hpc ~]$ cat my_blast.log
> Error memory mapping:/home/shams/ncbi-blast-2.9.0+/bin/nr.52.psq 
> openedFilesCount=151 threadID=0
> Error: NCBI C++ Exception:
> T0 
> "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_560232_130.14.18.6_9008__PrepareRelease_Linux64-Centos_1552331742/c++/compilers/unix/../../src/corelib/ncbiobj.cpp",
>  line 981: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to 
> access NULL pointer.
>  Stack trace:
>   blastx ???:0 ncbi::CStackTraceImpl::CStackTraceImpl() offset=0x77 
> addr=0x1d95da7
>   blastx ???:0 ncbi::CStackTrace::CStackTrace(std::string const&) 
> offset=0x25 addr=0x1d98465
>   blastx ???:0 ncbi::CException::x_GetStackTrace() offset=0xA0 
> addr=0x1ec7330
>   blastx ???:0 ncbi::CException::SetSeverity(ncbi::EDiagSev) offset=0x49 
> addr=0x1ec2169
>   blastx ???:0 ncbi::CObject::ThrowNullPointerException() offset=0x2D2 
> addr=0x1f42582
>   blastx ???:0 ncbi::blast::CBlastTracebackSearch::Run() offset=0x61C 
> addr=0xf2929c
>   blastx ???:0 ncbi::blast::CLocalBlast::Run() offset=0x404 addr=0xed4684
>   blastx ???:0 CBlastxApp::Run() offset=0xC9C addr=0x9cbf7c
>   blastx ???:0 ncbi::CNcbiApplication::x_TryMain(ncbi::EAppDiagStream, 
> char const*, int*, bool*) offset=0x8E3 addr=0x1da0e13
>   blastx ???:0 ncbi::CNcbiApplication::AppMain(int, char const* const*, 
> char const* const*, ncbi::EAppDiagStream, char const*, std::string const&) 
> offset=0x782 addr=0x1d9f6b2
>   blastx ???:0 main offset=0x5E5 addr=0x9caa05
>   /lib64/libc.so.6 ???:0 __libc_start_main offset=0xF5 addr=0x7f9a0fb3e505
>   blastx ???:0 blastx() [0x9ca345] offset=0x0 addr=0x9ca345
> 
> 
> 
> Any idea about that?
> 
> 
> Regards,
> Mahmood
> 
> 




[slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
Hi,
Although I can run the blastx command on terminal on all nodes, I can not
use slurm for that due to a so called "memory map error".
Please see below that I pressed ^C after some seconds when running via
terminal.

Fri Jan 24 15:29:57 +0330 2020
[shams@hpc ~]$ blastx -db ~/ncbi-blast-2.9.0+/bin/nr -query
~/khTrinityfilterless1.fasta -max_target_seqs 5 -outfmt 6 -evalue 1e-5
-num_threads 2
^C
[shams@hpc ~]$ date
Fri Jan 24 15:30:09 +0330 2020


However, the following script fails

[shams@hpc ~]$ cat slurm_blast.sh
#!/bin/bash
#SBATCH --job-name=blast1
#SBATCH --output=my_blast.log
#SBATCH --partition=SEA
#SBATCH --account=fish
#SBATCH --mem=38GB
#SBATCH --nodelist=hpc
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2

export PATH=~/ncbi-blast-2.9.0+/bin:$PATH
blastx -db ~/ncbi-blast-2.9.0+/bin/nr -query ~/khTrinityfilterless1.fasta
-max_target_seqs 5 -outfmt 6 -evalue 1e-5 -num_threads 2
[shams@hpc ~]$ sbatch slurm_blast.sh
Submitted batch job 284
[shams@hpc ~]$ cat my_blast.log
Error memory mapping:/home/shams/ncbi-blast-2.9.0+/bin/nr.52.psq
openedFilesCount=151 threadID=0
Error: NCBI C++ Exception:
T0
"/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_560232_130.14.18.6_9008__PrepareRelease_Linux64-Centos_1552331742/c++/compilers/unix/../../src/corelib/ncbiobj.cpp",
line 981: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
access NULL pointer.
 Stack trace:
  blastx ???:0 ncbi::CStackTraceImpl::CStackTraceImpl() offset=0x77
addr=0x1d95da7
  blastx ???:0 ncbi::CStackTrace::CStackTrace(std::string const&)
offset=0x25 addr=0x1d98465
  blastx ???:0 ncbi::CException::x_GetStackTrace() offset=0xA0
addr=0x1ec7330
  blastx ???:0 ncbi::CException::SetSeverity(ncbi::EDiagSev)
offset=0x49 addr=0x1ec2169
  blastx ???:0 ncbi::CObject::ThrowNullPointerException() offset=0x2D2
addr=0x1f42582
  blastx ???:0 ncbi::blast::CBlastTracebackSearch::Run() offset=0x61C
addr=0xf2929c
  blastx ???:0 ncbi::blast::CLocalBlast::Run() offset=0x404
addr=0xed4684
  blastx ???:0 CBlastxApp::Run() offset=0xC9C addr=0x9cbf7c
  blastx ???:0 ncbi::CNcbiApplication::x_TryMain(ncbi::EAppDiagStream,
char const*, int*, bool*) offset=0x8E3 addr=0x1da0e13
  blastx ???:0 ncbi::CNcbiApplication::AppMain(int, char const* const*,
char const* const*, ncbi::EAppDiagStream, char const*, std::string const&)
offset=0x782 addr=0x1d9f6b2
  blastx ???:0 main offset=0x5E5 addr=0x9caa05
  /lib64/libc.so.6 ???:0 __libc_start_main offset=0xF5
addr=0x7f9a0fb3e505
  blastx ???:0 blastx() [0x9ca345] offset=0x0 addr=0x9ca345



Any idea about that?


Regards,
Mahmood


[slurm-users] Multinode blast run

2020-01-24 Thread Mahmood Naderan
Hi,
Has anyone run blast on multiple nodes via slurm? The question should be
asked from blast guys but I didn't find their discussion mailing list.

I see the example on [1] which uses "-N 1" and "--ntasks-per-node".
So that limits to one node run only.

Thanks for any comment.

[1] http://hpc.mediawiki.hull.ac.uk/Applications/Ncbi-blast


Regards,
Mahmood