Pardon me for poking in here.  I hope this is helpful.
It looks like the failure is on an lseek command which means the file isn't 
there or was truncated, as Zeke suggests.
When you deploy your jobs to the cloud cluster, perhaps there's a directive 
which can be tweaked to give you a bigger playpen.

Best - Don


> -----Original Message-----
> From: freesurfer-boun...@nmr.mgh.harvard.edu [mailto:freesurfer-
> boun...@nmr.mgh.harvard.edu] On Behalf Of Z K
> Sent: Friday, November 18, 2016 10:36 AM
> To: freesurfer@nmr.mgh.harvard.edu
> Subject: Re: [Freesurfer] Recon-all: stochastic mri_nu_correct error when run 
> on
> HPC cluster
> 
> That is a very odd error and I havent seen anything like it that I can recall.
> 
> My initial thoughts are that the nu_evaluate and nu_correct commands are part
> of the mni tools shipped with freesurfer. The mni tools do a lot of reading,
> writing and deleting of temporary files. (You can see how it mentions
> 'tmp.mri_nu_correct.mni.223851' in the log file you sent).
> 
> It's possible that your cluster has limits on the I/O (our does and the mni 
> tools
> were the culprit on our case). Maybe its deleting some of these temp files or
> perhaps the temporary space is just getting filled.
> This could cause the nu_evaluate command to fail in an ungraceful way.
> 
> 
> 
> On 11/17/2016 05:14 PM, Anders Perrone wrote:
> > Hi FreeSurfer Developers,
> >
> >
> >
> > I'm running freesurfer from the HCP pipeline developed by the
> > Washington University in St. Louis. It ran just fine on our
> > university's cluster, but when I migrated the pipeline to Exacloud,
> > Intel's HPC cluster, the recon-all command began randomly failing.
> > Occasionally it will work on the first try, but more often than not
> > the same command has to be re-run upwards of 30 times before it will
> > succeed. I've provided the recon-all command and the relevant section
> > of the log file including the error here (attached is the full 
> > recon-all.log file):
> >
> >
> >
> > recon-all -i ./T1w/T1w_acpc_dc_restore_1mm.nii.gz -subjid ${SUBJECT_ID}
> > -sd . /T1w -motioncor -tal    airach -nuintensitycor -normalization
> >
> >
> >
> > ...
> >
> > [perronea@exanode-3-9.local${PWD}/T1w/washu_INV0WC5U4JA/mri/]
> > [2016-11-17 13:08:38] running:
> >
> >
> > /home/exacloud/lustre1/users/mirandad/usr/local/freesurfer53/mni/bin/m
> > ake_template -quiet -shrink 3 ./tmp.mri_nu_correct.mni.223851/nu1.mnc
> > ./tmp.mri_nu_correct.mni.223851/1//template.mnc
> >
> >
> >
> > Transforming
> > slices:...............................................................
> > .......................Done
> >
> > Transforming
> >
> slices:.......................................................................................................................
> ..................................mincresample:
> > posixio.c:210: px_pgin: Assertion `*posp == ((off_t)(-1)) || *posp ==
> > lseek(nciop->fd, 0, 1)' failed.
> >
> > nu_evaluate: crashed while running mincresample (termination
> > status=134)
> >
> > nu_correct: crashed while running nu_evaluate (termination
> > status=65280)
> >
> > ERROR: nu_correct
> >
> > Linux exanode-3-9.local 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15
> > 10:13:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> >
> >
> >
> > recon-all -s washu_INV0WC5U4JA exited with ERRORS at Thu Nov 17
> > 13:08:40 PST 2016
> >
> >
> >
> > It doesn't appear than any similar errors have been reported on
> > freesurfer. Our current work-around is to simply re-run the job on
> > different nodes until it works, but this is not a sustainable
> > long-term solution. Any guidance on troubleshooting this error would
> > be greatly appreciated.
> >
> >
> >
> > FreeSurfer version: freesurfer-Linux-centos6_x86_64-stable-pub-v5.3.0
> >
> > Platform: CentOS release 6.5 (FINAL)
> >
> > uname -a: Linux exalab3.ohsu.edu 2.6.32-504.12.2.el6.x86_64 #1 SMP Wed
> > Mar 11 22:03:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> >
> >
> >
> > Thanks,
> >
> > Anders
> >
> >
> >
> >
> >
> > Anders Perrone,
> >
> > Research Assistant I
> >
> > Fair Neuroimaging Lab
> >
> >
> >
> > perro...@ohsu.edu <mailto:perro...@ohsu.edu>
> >
> > 503-418-1897
> >
> >
> >
> > Oregon Health & Science University
> >
> > Mail code:L470
> >
> > 3181 SW Sam Jackson Park Road
> >
> > Portland, Oregon 97239-3098
> >
> >
> >
> >
> >
> > "If there is no solution to the problem then don't waste time worrying
> > about it. If there is a solution to the problem then don't waste time
> > worrying about it." - Dalai Lama XIV
> >
> >
> >
> >
> >
> > _______________________________________________
> > Freesurfer mailing list
> > Freesurfer@nmr.mgh.harvard.edu
> > https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
> >
> _______________________________________________
> Freesurfer mailing list
> Freesurfer@nmr.mgh.harvard.edu
> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
> 
> 
> The information in this e-mail is intended only for the person to whom it is
> addressed. If you believe this e-mail was sent to you in error and the e-mail
> contains patient information, please contact the Partners Compliance HelpLine
> at http://www.partners.org/complianceline . If the e-mail was sent to you in
> error but does not contain patient information, please contact the sender and
> properly dispose of the e-mail.


_______________________________________________
Freesurfer mailing list
Freesurfer@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer

Reply via email to