On Tue, Dec 22, 2009 at 7:17 AM, Ghose, Kaushik < kaushik_gh...@hms.harvard.edu> wrote:
> Hi, > > Regarding Greg´s post about how to organize plotting code for data. This is > a common issue encountered regardless of who collected the data or how the > data was generated. I´m an experimental neuroscientist in that I collect the > data that I then analyze to test hypotheses and models. After some thrashing > about I´ve kind of settled with the following design (parts in common with > Greg) > > 1. Analysis and plotting are separate. Analysis often takes a lot of CPU > time, whereas plotting doesn´t. A given analysis can be plotted in many > different ways and often I want to tweak plots. I don´t want to recompute > the data each time. So a pragmatic way is to save the analysed data as a > pickle file and have the plotting code load it. > > 2. Analysis code is written to be run non-interactively using the command > line options package to pass parameters/instructions. Useful when I want to > run the code on remote machines, or parallelize the code. > I don't do very heavy computations that always require multiple cores to perform the analysis. Most of the time a fast-single computer is enough for my analysis-plotting needs. This said I want to comment on the last two points of your e-mail. > > 3. No GUIs. This has saved me so much time. I just write plotting code that > pops up (or saves as pdf) one figure according to command line options. If I > need a new type of figure I just copy the code into a new script/module and > save it separately. This is much easier to debug than interactive GUIs that > do a gazillion things. > Sometimes GUIs simplify things a lot especially when I am doing quick-looks to the data. You can take a look at Traits [ http://code.enthought.com/projects/traits/] Your opinions might change after seeing how easy to design a GUI for your needs. > > 4. Source control. Don´t delete any code, save it under different folders > organized by idea or by date. I've always found myself asking, months later, > I made a plot like this, where is it, I want to see what I did there. > There is even a better approach for this. You can use web-based source-code management systems (e.g. code.google.com or www.sourceforge.net) Either way they provide great amount of flexibility for solo or multiple developer projects. > > That's the current credo that has helped me waste a little less time when I > want to test an idea with my data. > > Best > -Kaushik > > ------------------------------ > > Message: 7 > Date: Mon, 21 Dec 2009 17:42:40 -0500 > From: Greg Novak <no...@ucolick.org> > Subject: [Matplotlib-users] Best practices for organizing plotting > code? > To: matplotlib-users@lists.sourceforge.net > Message-ID: > <ad0d4fcf0912211442x1261b84ar79945c045a1af...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hello, > I do computational science and I think I'm typical in that I've > accumulated a huge pile of code to post-process simulations and draw > plots. I think the number of lines of plotting code is now greater > than the number of lines in the actual simulation code... The problem > with plotting code is that so much of it has such a short > lifetime---you have an idea, spend some time writing code to draw the > relevant plot, then the plot isn't interesting and you delete the > code. Therefore there's little incentive to spend any time making > sure that plotting code is at all well-designed. Nevertheless, _some_ > of it tends to live a long time and get ever more complicated---then > the lack of design becomes ever more painful as time goes on. You > simply don't know at the beginning which code will be thrown away and > which will live a long time. > > Over the years I've developed my favorite way to organize my plotting > code but it's far from perfect and I'd love to gather ideas from the > MPL community. So, my current "design principles" are basically > these: > > 1) Don't over-design. A simple system that's used consistently is > better than a half-implemented complicated system. Furthermore, most > plotting code gets thrown away, so keeping overhead down is one of the > primary considerations. > > 2) Keep computation separate from plotting wherever possible. > Therefore I have functions like "def compute_optical_depth(...)" that > compute the physical quantities to be plotted and "def > plot_optical_depth(...)" that handle everything about the visual > appearance of the plot. Then when I want to draw some other plot > involving optical depth, the calculation is neatly packaged into a > function. > > 3) Keep annotation, axis labels, legends, etc, separate from the code > that actually draws the lines on the axes. This allows you to compose > plots to a certain extent. I often find myself saying "I want plot B > to look just like plot A but with this extra information, extra lines, > extra annotation, or whatever" If the function that draws plot A just > puts the data on the axes without axis labels, etc, then the function > that draws plot B can easily use it directly. If the function that > draws plot A _also_ draws a bunch of annotations and labels, then the > function that draws plot B must either get rid of them or hope they > still make sense in the new context. > > 4) Don't put clf() and cla() all over the place. When working > interactively, it's very tempting to put clf()'s into every function > that draws a plot in order to save a few keystrokes. However, plots > don't know the context into which they're being drawn, therefore they > have no authority to clear the screen. They may "own" the whole > plotting window, or they may be incorporated into a larger context. > The function that worries about axis labels, annotations, and titles > is allowed to call cla(). The function that worries about subplots is > allowed to call clf(). If you might use the code over a slow link > (e.g. connecting to a supercomputing site via residential DSL) then no > function should call draw() -- that's the user's job. > > The upshot of these is that I end up with four layers of functions: > > 1) compute_physical_quantity(...): just handles numbers > 2) draw_physical_quantity(...): has calls to pylab.plot() handling > colors, linestyles, etc, but not annotations > 3) some_plot(...): has calls to draw_physical_quantity(), > some_related_physical_quantity(), along with axis labels, annotations, > legends, and pylab.cla() > 4) some_figure(...): has multiple panels with calls to > pylab.subplot(), pylab.clf(), some_plot_a(), some_plot_b(), etc. > > Sometimes layers 2 and 3 are combined because I'm lazy if layer 2 > would really be just a single call to pylab.plot. > > Please remember that I'm not writing these down because I think > they're so great that everyone needs to know about them. I'm hoping > that people will respond with much better ideas that I can adopt for > myself. > > Thanks, > Greg > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and > easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > Matplotlib-users mailing list > Matplotlib-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > -- Gökhan
------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users