On Tue, Dec 22, 2009 at 7:17 AM, Ghose, Kaushik <
kaushik_gh...@hms.harvard.edu> wrote:

> Hi,
>
> Regarding Greg´s post about how to organize plotting code for data. This is
> a common issue encountered regardless of who collected the data or how the
> data was generated. I´m an experimental neuroscientist in that I collect the
> data that I then analyze to test hypotheses and models. After some thrashing
> about I´ve kind of settled with the following design (parts in common with
> Greg)
>
> 1. Analysis and plotting are separate. Analysis often takes a lot of CPU
> time, whereas plotting doesn´t. A given analysis can be plotted in many
> different ways and often I want to tweak plots. I don´t want to recompute
> the data each time. So a pragmatic way is to save the analysed data as a
> pickle file and have the plotting code load it.
>
> 2. Analysis code is written to be run non-interactively using the command
> line options package to pass parameters/instructions. Useful when I want to
> run the code on remote machines, or parallelize the code.
>

I don't do very heavy computations that always require multiple cores to
perform the analysis. Most of the time a fast-single computer is enough for
my analysis-plotting needs. This said I want to comment on the last two
points of your e-mail.


>
> 3. No GUIs. This has saved me so much time. I just write plotting code that
> pops up (or saves as pdf) one figure according to command line options. If I
> need a new type of figure I just copy the code into a new script/module and
> save it separately. This is much easier to debug than interactive GUIs that
> do a gazillion things.
>

Sometimes GUIs simplify things a lot especially when I am doing quick-looks
to the data. You can take a look at Traits [
http://code.enthought.com/projects/traits/] Your opinions might change after
seeing how easy to design a GUI for your needs.



>
> 4. Source control. Don´t delete any code, save it under different folders
> organized by idea or by date. I've always found myself asking, months later,
> I made a plot like this, where is it, I want to see what I did there.
>

There is even a better approach for this. You can use web-based source-code
management systems (e.g. code.google.com or www.sourceforge.net) Either way
they provide great amount of flexibility for solo or multiple developer
projects.


>
> That's the current credo that has helped me waste a little less time when I
> want to test an idea with my data.
>
> Best
> -Kaushik
>
> ------------------------------
>
> Message: 7
> Date: Mon, 21 Dec 2009 17:42:40 -0500
> From: Greg Novak <no...@ucolick.org>
> Subject: [Matplotlib-users] Best practices for organizing plotting
>        code?
> To: matplotlib-users@lists.sourceforge.net
> Message-ID:
>        <ad0d4fcf0912211442x1261b84ar79945c045a1af...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hello,
> I do computational science and I think I'm typical in that I've
> accumulated a huge pile of code to post-process simulations and draw
> plots.  I think the number of lines of plotting code is now greater
> than the number of lines in the actual simulation code...  The problem
> with plotting code is that so much of it has such a short
> lifetime---you have an idea, spend some time writing code to draw the
> relevant plot, then the plot isn't interesting and you delete the
> code.  Therefore there's little incentive to spend any time making
> sure that plotting code is at all well-designed.  Nevertheless, _some_
> of it tends to live a long time and get ever more complicated---then
> the lack of design becomes ever more painful as time goes on.  You
> simply don't know at the beginning which code will be thrown away and
> which will live a long time.
>
> Over the years I've developed my favorite way to organize my plotting
> code but it's far from perfect and I'd love to gather ideas from the
> MPL community.  So, my current "design principles" are basically
> these:
>
> 1) Don't over-design.  A simple system that's used consistently is
> better than a half-implemented complicated system.  Furthermore, most
> plotting code gets thrown away, so keeping overhead down is one of the
> primary considerations.
>
> 2) Keep computation separate from plotting wherever possible.
> Therefore I have functions like "def compute_optical_depth(...)" that
> compute the physical quantities to be plotted and "def
> plot_optical_depth(...)" that handle everything about the visual
> appearance of the plot.  Then when I want to draw some other plot
> involving optical depth, the calculation is neatly packaged into a
> function.
>
> 3) Keep annotation, axis labels, legends, etc, separate from the code
> that actually draws the lines on the axes.  This allows you to compose
> plots to a certain extent.  I often find myself saying "I want plot B
> to look just like plot A but with this extra information, extra lines,
> extra annotation, or whatever"  If the function that draws plot A just
> puts the data on the axes without axis labels, etc, then the function
> that draws plot B can easily use it directly.  If the function that
> draws plot A _also_ draws a bunch of annotations and labels, then the
> function that draws plot B must either get rid of them or hope they
> still make sense in the new context.
>
> 4) Don't put clf() and cla() all over the place.  When working
> interactively, it's very tempting to put clf()'s into every function
> that draws a plot in order to save a few keystrokes.  However, plots
> don't know the context into which they're being drawn, therefore they
> have no authority to clear the screen.  They may "own" the whole
> plotting window, or they may be incorporated into a larger context.
> The function that worries about axis labels, annotations, and titles
> is allowed to call cla().  The function that worries about subplots is
> allowed to call clf().  If you might use the code over a slow link
> (e.g. connecting to a supercomputing site via residential DSL) then no
> function should call draw() -- that's the user's job.
>
> The upshot of these is that I end up with four layers of functions:
>
> 1) compute_physical_quantity(...): just handles numbers
> 2) draw_physical_quantity(...): has calls to pylab.plot() handling
> colors, linestyles, etc, but not annotations
> 3) some_plot(...): has calls to draw_physical_quantity(),
> some_related_physical_quantity(), along with axis labels, annotations,
> legends, and pylab.cla()
> 4) some_figure(...): has multiple panels with calls to
> pylab.subplot(), pylab.clf(), some_plot_a(), some_plot_b(), etc.
>
> Sometimes layers 2 and 3 are combined because I'm lazy if layer 2
> would really be just a single call to pylab.plot.
>
> Please remember that I'm not writing these down because I think
> they're so great that everyone needs to know about them.  I'm hoping
> that people will respond with much better ideas that I can adopt for
> myself.
>
> Thanks,
> Greg
>
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast and
> easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev
> _______________________________________________
> Matplotlib-users mailing list
> Matplotlib-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>



-- 
Gökhan
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to