Hello Tim,
Than you very much. All you guesses hit the target. Then I did the following and encountered some weird errors. I changed the name of the queue in PostFreeSurferPipelineBatch.sh like this: # QUEUE="-q hcp_priority.q" QUEUE="-q all.q" (I have this queue in SGE) and gave it 3 subjects at the same time Subjlist="100307 100308 100309" Then, I encounter this error: "Unable to run job: Backslash ('\') not allowed in objectname." My user name on the machine is like this: ADdomain\userid. Is SGE unhappy with a backslash in the username? I set the user in the queue the same as my username for the machine. Since this machine is a server and I cannot change my username, but I have the sudo right. Then I switched to root to run the job (the username of root is just "root"). Since all the software was installed by sudo command under my own username, I redid all the setup for root user and created a projects folder under /root/. Then the script submitted the 3 subjects successfully, the terminal gave back the task IDs of them, and .sh.o... and .sh.e... files were generated. However, all the tasks failed, and the information in the .sh.e... files is like this: # cat PreFreeSurferPipeline.sh.e23 /usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: libfslio.so: cannot open shared object file: No such file or directory /usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: libfslio.so: cannot open shared object file: No such file or directory /usr/share/fsl/5.0/bin/fslreorient2std: 99: [: =: unexpected operator /usr/share/fsl/5.0/bin/fslreorient2std: 108: [: =: unexpected operator /usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: libfslio.so: cannot open shared object file: No such file or directory /usr/share/fsl/5.0/bin/avscale: error while loading shared libraries: libnewimage.so: cannot open shared object file: No such file or directory (standard_in) 1: syntax error /usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: libfslio.so: cannot open shared object file: No such file or directory /usr/share/fsl/5.0/bin/fslswapdim: 107: [: =: unexpected operator /usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: libfslio.so: cannot open shared object file: No such file or directory /usr/share/fsl/5.0/bin/fslswapdim: 115: [: -gt: unexpected operator /usr/share/fsl/5.0/bin/fslhd: error while loading shared libraries: libfslio.so: cannot open shared object file: No such file or directory /usr/share/fsl/5.0/bin/fslswapdim: 117: [: -gt: unexpected operator /usr/share/fsl/5.0/bin/robustfov: error while loading shared libraries: libnewimage.so: cannot open shared object file: No such file or directory I think I didn't configure it correctly for the root user, and the variable LD_LIBRARY_PATH under root and my username are different. $ echo "$LD_LIBRARY_PATH" /usr/lib/fsl/5.0::/lib:/usr/local/lib:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/local/MATLAB/MATLAB_Runtime/v84/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v84/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v84/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v84/sys/opengl/lib/glnxa64:/usr/bxh_xcede_tools-1.11.1-lsb30.x86_64 # echo "$LD_LIBRARY_PATH" /usr/lib/fsl/5.0 What shall I do to fix this? Thank you very much. Best, Gengyan ________________________________ From: Timothy Coalson <tsc...@mst.edu> Sent: Friday, May 13, 2016 3:21 PM To: Gengyan Zhao Cc: hcp-users@humanconnectome.org; glass...@wustl.edu; tbbr...@wustl.edu Subject: Re: [HCP-Users] How to monitor Pipeline's running with SGE (multi-CPU cores) Some guesses: have you set the queue name in your modified ...PipelineBatch.sh script(s) to one that exists on your system? Does the subject list contain more than one subject? If not, then queue submission won't do much, as it isn't about multithreading on single subjects, and running it with an unknown queue (default has some name containing "hcp") will not submit it to any queue, and will instead run it foreground in that terminal, but capturing stdout and stderr to files. Also note that some wb_command steps do have multithreading (FSL and freesurfer generally don't, and not all wb_command steps do either), and will use all cores it can find on the machine by default, and if you have a multi-socket system, this will be much slower than restricting it to run on one socket, as cross-socket memory access is considerably slower - you can export OMP_NUM_THREADS=8 or similar to reduce the number of threads it tries to use, but you might need to do something more to ensure the threads stay on one socket. Tim On Fri, May 13, 2016 at 11:20 AM, Gengyan Zhao <gzha...@wisc.edu<mailto:gzha...@wisc.edu>> wrote: Hello Tim and Matt, After I configure the queues according to the FSLSGE website, my terminal can show this: $ qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- all.q@localhost BIP 0/0/31 0.28 lx26-amd64 --------------------------------------------------------------------------------- long.q@localhost BIP 0/0/31 0.28 lx26-amd64 --------------------------------------------------------------------------------- mainqueue@localhost BIP 0/0/31 0.28 lx26-amd64 --------------------------------------------------------------------------------- short.q@localhost BIP 0/0/31 0.28 lx26-amd64 --------------------------------------------------------------------------------- verylong.q@localhost BIP 0/0/31 0.28 lx26-amd64 --------------------------------------------------------------------------------- veryshort.q@localhost BIP 0/0/31 0.28 lx26-amd64 but still nothing for $ qstat -u "*" $ qstat -u `whoami` $ However, I found all about these queues are commented out in fsl_sub map_qname () { # for Debian we can't do the stuff below, because it would be hard # to determine how particular queues are meant to be used on any given # system. Instead of translating into a queue name we specify proper # resource limits, and let SGE decide what queue matches # (qsub wants the time limit in seconds) queueCmd="$queueCmd -l h_rt=$(echo "$1 * 60" | bc)" #if [ $1 -le 20 ] ; then #queue=veryshort.q #elif [ $1 -le 120 ] ; then #queue=short.q #elif [ $1 -le 1440 ] ; then #queue=long.q #else #queue=verylong.q #fi #queueCmd=" -q $queue " #echo "Estimated time was $1 mins: queue name is $queue" } I don't know why, and this is as it was. I have never changed this file since installation. Then how can the fsl_sub use these queues? I'm using a Ubuntu 14.04 system. Thanks. Best, Gengyan ________________________________ From: hcp-users-boun...@humanconnectome.org<mailto:hcp-users-boun...@humanconnectome.org> <hcp-users-boun...@humanconnectome.org<mailto:hcp-users-boun...@humanconnectome.org>> on behalf of Gengyan Zhao <gzha...@wisc.edu<mailto:gzha...@wisc.edu>> Sent: Friday, May 13, 2016 12:14:47 AM To: hcp-users@humanconnectome.org<mailto:hcp-users@humanconnectome.org>; glass...@wustl.edu<mailto:glass...@wustl.edu>; tbbr...@wustl.edu<mailto:tbbr...@wustl.edu> Subject: Re: [HCP-Users] How to monitor Pipeline's running with SGE (multi-CPU cores) Hi Matt, Thank you for you quick response too. Then what shall I do to let the SGE execute host to be recognized? Please also see the email below about how I installed and configured the SGE. Thanks. Best, Gengyan ________________________________ From: hcp-users-boun...@humanconnectome.org<mailto:hcp-users-boun...@humanconnectome.org> <hcp-users-boun...@humanconnectome.org<mailto:hcp-users-boun...@humanconnectome.org>> on behalf of Gengyan Zhao <gzha...@wisc.edu<mailto:gzha...@wisc.edu>> Sent: Friday, May 13, 2016 12:02 AM To: hcp-users@humanconnectome.org<mailto:hcp-users@humanconnectome.org>; tbbr...@wustl.edu<mailto:tbbr...@wustl.edu> Subject: Re: [HCP-Users] How to monitor Pipeline's running with SGE (multi-CPU cores) Hi Tim, Thank you for your quick response. I tried both of the commands while the pipeline is running, and nothing showed up, totally nothing. I copied the following from the terminal. $ qstat -u "*" $ qstat -u `whoami` $ For the installation and setup of SGE, I did them again as this link https://scidom.wordpress.com/2012/01/18/sge-on-single-pc/ [https://scidom.files.wordpress.com/2012/01/qmon07.jpg?w=300]<https://scidom.wordpress.com/2012/01/18/sge-on-single-pc/> Installing and Setting Up Sun Grid Engine on a Single ...<https://scidom.wordpress.com/2012/01/18/sge-on-single-pc/> scidom.wordpress.com<http://scidom.wordpress.com> Click on the “Userset” tab and either create a new set of users or highlight an existing one. To follow up with my exemplified setup, highlight the ... I can see the qmon window pop up after I use the qmon command, and I can see the sge processes running like this: # ps aux | grep "sge" sgeadmin 13315 0.0 0.0 55440 3848 ? Sl 23:17 0:00 /usr/lib/gridengine/sge_execd sgeadmin 13377 0.2 0.0 140184 7356 ? Sl 23:17 0:00 /usr/lib/gridengine/sge_qmaster root 13411 0.0 0.0 11748 2132 pts/4 S+ 23:17 0:00 grep --color=auto sge So I assume the SGE was installed correctly, but not setup correctly. Besides setup a queue as mentioned in the link, I set FSLPARALLEL=1 in fsl.sh and sourced it, left FSL_ROOT unset and left FSLCLUSTER_DEFAULT_QUEUE unset. Is there any error with my installation or configuration of the SGE? Thanks. Best, Gengyan ________________________________ From: Timothy B. Brown <tbbr...@wustl.edu<mailto:tbbr...@wustl.edu>> Sent: Thursday, May 12, 2016 3:26:22 PM To: Gengyan Zhao; hcp-users@humanconnectome.org<mailto:hcp-users@humanconnectome.org> Subject: Re: [HCP-Users] How to monitor Pipeline's running with SGE (multi-CPU cores) Hi Gengyan, Please try the following commands: To list all the jobs running on your grid: $ qstat -u "*" Be sure to enclose the asterisk in double quotes as shown. To list all the jobs running on your grid that were submitted under your user account: $ qstat -u `whoami` Be sure to use "back quotes" around the whoami command (single quotes that go from the upper-left towards the lower-right.) Let's see if you see any jobs running that way. Tim On Thu, May 12, 2016, at 14:24, Gengyan Zhao wrote: > Hello HCP Masters, > > My question is how can I know the SGE and FSL is setup properly and FSL > is running with multi-CPU cores. How can I monitor the parallel runing of > FSL with SGE? > > I'm using a 32-core 3.0GHz, 128GB RAM, Ubuntu 14.04 machine to run FSL. > And I'm a HCP pipeline user. SGE was setup according to the instruction > in the external link given by FSL website > (http://chrisfilo.tumblr.com/post/579493955/how-to-configure-sun-grid-engine-for-fsl-under). > FSLPARALLEL=1. SGE_ROOT has not been set. > > Then when I run the pipeline, most of the time only 3.1% of the CPU is in > usage. SInce 3.1%*32=1, most of the time only one core is occupied. With > the command top and 1 to see the activity of each core in real time, > almost all the time only one core is in 100% usage. With the command > qstat -f, I can only see the queue configured by myself following the > instruction in external link. This is the output of qstat -f, when FSL > (actually PreFreeSurferPipelineBatch.sh in the HCP pipeline, which calls > a bunch of FSL tools) is running. > > queuename qtype resv/used/tot. load_avg arch > states > --------------------------------------------------------------------------------- > mainqueue@localhost BIP 0/0/31 -NA- -NA- > au > > Thanks. > > Best, > Gengyan > > Research Assistant > Medical Physics, UW-Madison > > _______________________________________________ > HCP-Users mailing list > HCP-Users@humanconnectome.org<mailto:HCP-Users@humanconnectome.org> > http://lists.humanconnectome.org/mailman/listinfo/hcp-users -- Timothy B. Brown Business & Technology Application Analyst III Pipeline Developer (Human Connectome Project) tbbrown(at)wustl.edu<http://wustl.edu> ________________________________________ The material in this message is private and may contain Protected Healthcare Information (PHI). If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ HCP-Users mailing list HCP-Users@humanconnectome.org<mailto:HCP-Users@humanconnectome.org> http://lists.humanconnectome.org/mailman/listinfo/hcp-users _______________________________________________ HCP-Users mailing list HCP-Users@humanconnectome.org<mailto:HCP-Users@humanconnectome.org> http://lists.humanconnectome.org/mailman/listinfo/hcp-users _______________________________________________ HCP-Users mailing list HCP-Users@humanconnectome.org<mailto:HCP-Users@humanconnectome.org> http://lists.humanconnectome.org/mailman/listinfo/hcp-users _______________________________________________ HCP-Users mailing list HCP-Users@humanconnectome.org http://lists.humanconnectome.org/mailman/listinfo/hcp-users