Hello Tim,

Than you very much. All you guesses hit the target. Then I did the following 
and encountered some weird errors.


I changed the name of the queue in PostFreeSurferPipelineBatch.sh like this:

#    QUEUE="-q hcp_priority.q"
    QUEUE="-q all.q" (I have this queue in SGE)
and gave it 3 subjects at the same time
Subjlist="100307 100308 100309"

Then, I encounter this error: "Unable to run job: Backslash ('\') not allowed 
in objectname."
My user name on the machine is like this: ADdomain\userid. Is SGE unhappy with 
a backslash in the username?
I set the user in the queue the same as my username for the machine.

Since this machine is a server and I cannot change my username, but I have the 
sudo right. Then I switched to root to run the job (the username of root is 
just "root"). Since all the software was installed by sudo command under my own 
username, I redid all the setup for root user and created a projects folder 
under /root/. Then the script submitted the 3 subjects successfully, the 
terminal gave back the task IDs of them, and .sh.o... and .sh.e... files were 
generated. However, all the tasks failed, and the information in the .sh.e... 
files is like this:

# cat PreFreeSurferPipeline.sh.e23
/usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: 
libfslio.so: cannot open shared object file: No such file or directory
/usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: 
libfslio.so: cannot open shared object file: No such file or directory
/usr/share/fsl/5.0/bin/fslreorient2std: 99: [: =: unexpected operator
/usr/share/fsl/5.0/bin/fslreorient2std: 108: [: =: unexpected operator
/usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: 
libfslio.so: cannot open shared object file: No such file or directory
/usr/share/fsl/5.0/bin/avscale: error while loading shared libraries: 
libnewimage.so: cannot open shared object file: No such file or directory
(standard_in) 1: syntax error
/usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: 
libfslio.so: cannot open shared object file: No such file or directory
/usr/share/fsl/5.0/bin/fslswapdim: 107: [: =: unexpected operator
/usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: 
libfslio.so: cannot open shared object file: No such file or directory
/usr/share/fsl/5.0/bin/fslswapdim: 115: [: -gt: unexpected operator
/usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: 
libfslio.so: cannot open shared object file: No such file or directory
/usr/share/fsl/5.0/bin/fslswapdim: 117: [: -gt: unexpected operator
/usr/share/fsl/5.0/bin/robustfov: error while loading shared libraries: 
libnewimage.so: cannot open shared object file: No such file or directory

I think I didn't configure it correctly for the root user, and the variable 
LD_LIBRARY_PATH under root and my username are different.

$ echo "$LD_LIBRARY_PATH"
/usr/lib/fsl/5.0::/lib:/usr/local/lib:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/local/MATLAB/MATLAB_Runtime/v84/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v84/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v84/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v84/sys/opengl/lib/glnxa64:/usr/bxh_xcede_tools-1.11.1-lsb30.x86_64

# echo "$LD_LIBRARY_PATH"
/usr/lib/fsl/5.0

What shall I do to fix this? Thank you very much.

Best,
Gengyan



________________________________
From: Timothy Coalson <tsc...@mst.edu>
Sent: Friday, May 13, 2016 3:21 PM
To: Gengyan Zhao
Cc: hcp-users@humanconnectome.org; glass...@wustl.edu; tbbr...@wustl.edu
Subject: Re: [HCP-Users] How to monitor Pipeline's running with SGE (multi-CPU 
cores)

Some guesses: have you set the queue name in your modified ...PipelineBatch.sh 
script(s) to one that exists on your system?  Does the subject list contain 
more than one subject?  If not, then queue submission won't do much, as it 
isn't about multithreading on single subjects, and running it with an unknown 
queue (default has some name containing "hcp") will not submit it to any queue, 
and will instead run it foreground in that terminal, but capturing stdout and 
stderr to files.

Also note that some wb_command steps do have multithreading (FSL and freesurfer 
generally don't, and not all wb_command steps do either), and will use all 
cores it can find on the machine by default, and if you have a multi-socket 
system, this will be much slower than restricting it to run on one socket, as 
cross-socket memory access is considerably slower - you can export 
OMP_NUM_THREADS=8 or similar to reduce the number of threads it tries to use, 
but you might need to do something more to ensure the threads stay on one 
socket.

Tim


On Fri, May 13, 2016 at 11:20 AM, Gengyan Zhao 
<gzha...@wisc.edu<mailto:gzha...@wisc.edu>> wrote:

Hello Tim and Matt,


After I configure the queues according to the FSLSGE website, my terminal can 
show this:


$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
all.q@localhost                BIP   0/0/31         0.28     lx26-amd64
---------------------------------------------------------------------------------
long.q@localhost               BIP   0/0/31         0.28     lx26-amd64
---------------------------------------------------------------------------------
mainqueue@localhost            BIP   0/0/31         0.28     lx26-amd64
---------------------------------------------------------------------------------
short.q@localhost              BIP   0/0/31         0.28     lx26-amd64
---------------------------------------------------------------------------------
verylong.q@localhost           BIP   0/0/31         0.28     lx26-amd64
---------------------------------------------------------------------------------
veryshort.q@localhost          BIP   0/0/31         0.28     lx26-amd64

but still nothing for

$ qstat -u "*"
$ qstat -u `whoami`
$


However, I found all about these queues are commented out in fsl_sub

map_qname ()
{
        # for Debian we can't do the stuff below, because it would be hard
        # to determine how particular queues are meant to be used on any given
        # system. Instead of translating into a queue name we specify proper
        # resource limits, and let SGE decide what queue matches
        # (qsub wants the time limit in seconds)
        queueCmd="$queueCmd -l h_rt=$(echo "$1 * 60" | bc)"
    #if [ $1 -le 20 ] ; then
        #queue=veryshort.q
    #elif [ $1 -le 120 ] ; then
        #queue=short.q
    #elif [ $1 -le 1440 ] ; then
        #queue=long.q
    #else
        #queue=verylong.q
    #fi
    #queueCmd=" -q $queue "

    #echo "Estimated time was $1 mins: queue name is $queue"
}

I don't know why, and this is as it was. I have never changed this file since 
installation. Then how can the fsl_sub use these queues? I'm using a Ubuntu 
14.04 system. Thanks.

Best,
Gengyan


________________________________
From: 
hcp-users-boun...@humanconnectome.org<mailto:hcp-users-boun...@humanconnectome.org>
 
<hcp-users-boun...@humanconnectome.org<mailto:hcp-users-boun...@humanconnectome.org>>
 on behalf of Gengyan Zhao <gzha...@wisc.edu<mailto:gzha...@wisc.edu>>
Sent: Friday, May 13, 2016 12:14:47 AM
To: hcp-users@humanconnectome.org<mailto:hcp-users@humanconnectome.org>; 
glass...@wustl.edu<mailto:glass...@wustl.edu>; 
tbbr...@wustl.edu<mailto:tbbr...@wustl.edu>

Subject: Re: [HCP-Users] How to monitor Pipeline's running with SGE (multi-CPU 
cores)


Hi Matt,


Thank you for you quick response too. Then what shall I do to let the SGE 
execute host to be recognized? Please also see the email below about how I 
installed and configured the SGE. Thanks.


Best,

Gengyan


________________________________
From: 
hcp-users-boun...@humanconnectome.org<mailto:hcp-users-boun...@humanconnectome.org>
 
<hcp-users-boun...@humanconnectome.org<mailto:hcp-users-boun...@humanconnectome.org>>
 on behalf of Gengyan Zhao <gzha...@wisc.edu<mailto:gzha...@wisc.edu>>
Sent: Friday, May 13, 2016 12:02 AM
To: hcp-users@humanconnectome.org<mailto:hcp-users@humanconnectome.org>; 
tbbr...@wustl.edu<mailto:tbbr...@wustl.edu>
Subject: Re: [HCP-Users] How to monitor Pipeline's running with SGE (multi-CPU 
cores)


Hi Tim,


Thank you for your quick response. I tried both of the commands while the 
pipeline is running, and nothing showed up, totally nothing. I copied the 
following from the terminal.


$ qstat -u "*"

$ qstat -u `whoami`
$

For the installation and setup of SGE, I did them again as this link 
https://scidom.wordpress.com/2012/01/18/sge-on-single-pc/
[https://scidom.files.wordpress.com/2012/01/qmon07.jpg?w=300]<https://scidom.wordpress.com/2012/01/18/sge-on-single-pc/>

Installing and Setting Up Sun Grid Engine on a Single 
...<https://scidom.wordpress.com/2012/01/18/sge-on-single-pc/>
scidom.wordpress.com<http://scidom.wordpress.com>
Click on the “Userset” tab and either create a new set of users or highlight an 
existing one. To follow up with my exemplified setup, highlight the ...

I can see the qmon window pop up after I use the qmon command, and I can see 
the sge processes running like this:
# ps aux | grep "sge"
sgeadmin 13315  0.0  0.0  55440  3848 ?        Sl   23:17   0:00 
/usr/lib/gridengine/sge_execd
sgeadmin 13377  0.2  0.0 140184  7356 ?        Sl   23:17   0:00 
/usr/lib/gridengine/sge_qmaster
root     13411  0.0  0.0  11748  2132 pts/4    S+   23:17   0:00 grep 
--color=auto sge

So I assume the SGE was installed correctly, but not setup correctly. Besides 
setup a queue as mentioned in the link, I set FSLPARALLEL=1 in fsl.sh and 
sourced it, left FSL_ROOT unset and left FSLCLUSTER_DEFAULT_QUEUE unset.
Is there any error with my installation or configuration of the SGE? Thanks.

Best,
Gengyan

________________________________
From: Timothy B. Brown <tbbr...@wustl.edu<mailto:tbbr...@wustl.edu>>
Sent: Thursday, May 12, 2016 3:26:22 PM
To: Gengyan Zhao; 
hcp-users@humanconnectome.org<mailto:hcp-users@humanconnectome.org>
Subject: Re: [HCP-Users] How to monitor Pipeline's running with SGE (multi-CPU 
cores)

Hi Gengyan,

Please try the following commands:

To list all the jobs running on your grid:

$ qstat -u "*"

Be sure to enclose the asterisk in double quotes as shown.

To list all the jobs running on your grid that were submitted under your user 
account:

$ qstat -u `whoami`

Be sure to use "back quotes" around the whoami command (single quotes that go 
from the upper-left towards the lower-right.)

Let's see if you see any jobs running that way.

  Tim

On Thu, May 12, 2016, at 14:24, Gengyan Zhao wrote:
> Hello HCP Masters,
>
> My question is how can I know the SGE and FSL is setup properly and FSL
> is running with multi-CPU cores. How can I monitor the parallel runing of
> FSL with SGE?
>
> I'm using a 32-core 3.0GHz, 128GB RAM, Ubuntu 14.04 machine to run FSL.
> And I'm a HCP pipeline user. SGE was setup according to the instruction
> in the external link given by FSL website
> (http://chrisfilo.tumblr.com/post/579493955/how-to-configure-sun-grid-engine-for-fsl-under).
> FSLPARALLEL=1. SGE_ROOT has not been set.
>
> Then when I run the pipeline, most of the time only 3.1% of the CPU is in
> usage. SInce 3.1%*32=1, most of the time only one core is occupied. With
> the command top and 1 to see the activity of each core in real time,
> almost all the time only one core is in 100% usage. With the command
> qstat -f, I can only see the queue configured by myself following the
> instruction in external link. This is the output of qstat -f, when FSL
> (actually PreFreeSurferPipelineBatch.sh in the HCP pipeline, which calls
> a bunch of FSL tools) is running.
>
> queuename qtype resv/used/tot. load_avg arch
> states
> ---------------------------------------------------------------------------------
> mainqueue@localhost BIP 0/0/31 -NA- -NA-
> au
>
> Thanks.
>
> Best,
> Gengyan
>
> Research Assistant
> Medical Physics, UW-Madison
>
> _______________________________________________
> HCP-Users mailing list
> HCP-Users@humanconnectome.org<mailto:HCP-Users@humanconnectome.org>
> http://lists.humanconnectome.org/mailman/listinfo/hcp-users

--
Timothy B. Brown
Business & Technology Application Analyst III
Pipeline Developer (Human Connectome Project)
tbbrown(at)wustl.edu<http://wustl.edu>
________________________________________
The material in this message is private and may contain Protected Healthcare 
Information (PHI).
If you are not the intended recipient, be advised that any unauthorized use, 
disclosure, copying
or the taking of any action in reliance on the contents of this information is 
strictly prohibited.
If you have received this email in error, please immediately notify the sender 
via telephone or
return mail.

_______________________________________________
HCP-Users mailing list
HCP-Users@humanconnectome.org<mailto:HCP-Users@humanconnectome.org>
http://lists.humanconnectome.org/mailman/listinfo/hcp-users

_______________________________________________
HCP-Users mailing list
HCP-Users@humanconnectome.org<mailto:HCP-Users@humanconnectome.org>
http://lists.humanconnectome.org/mailman/listinfo/hcp-users

_______________________________________________
HCP-Users mailing list
HCP-Users@humanconnectome.org<mailto:HCP-Users@humanconnectome.org>
http://lists.humanconnectome.org/mailman/listinfo/hcp-users


_______________________________________________
HCP-Users mailing list
HCP-Users@humanconnectome.org
http://lists.humanconnectome.org/mailman/listinfo/hcp-users

Reply via email to