Hi Ian,

On Thursday 10 July 2008 06:03:54 am Ian Harry wrote:
> Hi all,
>
> Myself and my colleagues use, and have used, matplotlib and it's Tex
> capabilities quite extensively to create plots to assist in the
> gravitational wave searches we perform. (and it has been a great tool for
> us
>
> :-) ). However recently we have been running into problems when we have
>
> started automating our plot generation by running multiple plotting jobs
> concurrently using the condor scheduler (and dagmans). Many of our plotting
> jobs fail with messages such as the one below:
>
> ---snip---
>
> Traceback (most recent call last):
>  File
> "/home/romain/Projects/
> ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901414/868815014-868901414
>/inj001_summary_plots/../executables/plotinjnum", line 298, in ?
>    'eff_dist_h')
>  File
> "/home/romain/Projects/ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901
>414/868815014-868901414/inj001_summary_plots/../executables/plotinjnum",
> line 119, in plot_found_missed
>    fname_thumb = InspiralUtils.savefig_pylal(filename=fname,
> doThumb=True, dpi_thumb=opts.figure_resolution)
>  File
> "/home/romain/codes/s5_2yr_lv_lowcbc_20080625/pylal/lib64/python2.4/site-pa
>ckages/pylal/InspiralUtils.py", line 58, in savefig_pylal
>    fig.savefig(filename_thumb, dpi=dpi_thumb)
> ....
>  File "/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py", line
> 259, in make_png
>    os.remove(outfile)
>  OSError: [Errno 2] No such file or directory:
> '/home/romain/.matplotlib/tex.cache/ae479c90ff242327b54af004a0846188.output
>'
>
> ---snip---
>
> My feeling is that when the code invokes the Tex 'bit' it creates a temp
> file in ~/matplotlib/tex.cache and then deletes it and all other temp tex
> files when it finishes the Tex 'bit'. This would cause problems if another
> job is in the middle of running Tex when the other job deletes it's temp
> files!
>
> We are running a slightly old version of matplotlib (0.87.7), as we run on
> multiple clusters our sys admins tend to only update software when there is
> a need to and we have had no other problems with matplotlib, I apologize if
> this has been fixed in the meantime (I did do a quick search of the mailing
> list archive but found nothing). All our clusters currently run Fedora Core
> 4 (we're going to move to CentOS 5).
>
> Currently we are getting around this by forcing condor to retry the failed
> jobs 2/3 times, this catches most of these errors. Another solution would
> be to limit the number of jobs running to 1 BUT as we run dagmen from
> within one 'super' dagman it would prove difficult to limit jobs from
> multiple dagmen.
>
> Anyway if anyone has any ideas of how to solve this I would appreciate
> this. Also if there are any options where we can set the location of these
> temp tex files and use a different directory  for each job (or stop
> matplotlib deleting other temp files) that would help us.

I'm really hesitant to mess around with the location of the temp files. It was 
a bit painfull trying to get usetex to work across platforms.

Instead, would you try replacing:

os.remove(outfile)

with:

try: os.remove(outfile)
except OSError: pass

Let me know if that fixes it, and if you need to wrap any other file 
deletions.

Thanks,
Darren

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to