On Wed, Mar 24, 2010 at 4:19 PM, Michael Droettboom <md...@stsci.edu> wrote:
> Rich Krauter wrote:
>>>
>>> Rich Krauter wrote:
>>>
>>>>
>>>> Hello,
>>>>
>>>> I am a relatively new user of matplotlib; thank you to the matplotlib
>>>> team for this excellent package.
>>>> I have a question about serializing matplotlib figures.  I have searched
>>>> for serialization options for matplotlib figures but have not found much
>>>> information.  I am interested to hear about serialization use cases and the
>>>> approaches others use in these cases.
>>>>
>>>> Here is the reason I am asking:
>>>>
>>>> My use case for serialization is that I want to build a CouchDB database
>>>> of matplotlib figures.  The database could be accessed from a web
>>>> application (in my case I want to build a django app to create, edit and
>>>> manage figures) or desktop gui, or whatever.  For storage of the figures in
>>>> CouchDB, I am working on JSON representations of matplotlib figures.  The
>>>> JSON could be run through simple python functions to regenerate the
>>>> matplotlib figures.  I have very simple working examples, but to more
>>>> completely test out this approach I would attempt to recreate the plots in
>>>> the matplotlib gallery using JSON representations and a small set of
>>>> (hopefully) very simple python functions which would process the JSON
>>>> markup.
>>>>
>>>> Before I get too far, I wanted to see what others have done for similar
>>>> use cases, make sure I am not missing existing approaches, etc.  I am
>>>> getting ahead of myself now, but if there is broader interest in this
>>>> approach, and no other better solutions exist, I would set up a project on
>>>> Google Code or some other site to work on this.
>>>>
>>
>> On Wed, Mar 24, 2010 at 1:15 PM, Michael Droettboom <md...@stsci.edu>
>> wrote:
>>
>>>
>>> What is the advantage of JSON (is this specific case) over Python source
>>> code?  matplotlib is designed around it and it's more flexible.  Unless
>>> you're planning on automatically manipulating the JSON, I don't see why you
>>> wouldn't just use Python source.
>>>
>>> Mike
>>>
>>>
>>
>> Mike,
>>
>> I don't know that there is much of a benefit to JSON outside of my use
>> case or similar use cases.  I want to manipulate the JSON
>> representation of a figure within a javascript-based web interface to
>> provide dynamic plotting through a web page.  I also want to be able
>> to store and query JSON representations using CouchDB.
>>
>> I am probably not exactly clear on what you mean by "using python
>> source" to represent a figure.  Is there a standard agreed upon way to
>> do this?
>
> In general, most matplotlib users write Python scripts to generate their
> plots.  These scripts usually read in data from an external file in any
> number of formats (the format tends to be domain-specific, but matplotlib
> provides support for a number of CSV formats, Numpy itself supports a number
> of ways of reading arrays etc.)  matplotlib tends to be agnostic about data
> (as long as you can convert it to a Numpy array somehow, it's happy), but
> has a clearly defined API for plot types and styles.
>>
>>  I do have python source code representations of figures.
>> i.e. I have dict representations of matplotlib figures.  The dicts
>> have a "required" internal structure.  I feed the dict to a function
>> which regenerates the figure graphic from that structure.  If I want
>> to update the plot, I just change the contents of the dict data
>> structure representing the plot, not the source code that is used to
>> generate the figure. If I instead had a JSON object representation of
>> a figure, I would convert it to a python dict and use the same
>> function as before to produce the figure.
>>
>
> I guess I have trouble seeing why a dictionary representation which is then
> interpreted to convert it to function calls is better than just making the
> function calls directly.  That's the "interface" to matplotlib that is known
> and tested.
>

Here are my reasons why a structured representation (dict, JSON, XML,
...)  is useful:

- I want to access the same plot representation through both python
and through javascript.  I need to access it in python to run MPL and
create plot images, and I want to use javascript to build the user
interface.

- I want to separate the plot content from the plot generation.   I
can serialize a data structure containing plot contents more easily
than I can record the commands a user might call to generate a plot.
The content of the plot is not python specific, only the generation of
the MPL plot is.  I need to be able to serialize the content to
support later modifications.



> The only use case I can imagine where a dictionary might be preferable would
> be if an external tool needs to read in the dictionary, modify it and spit
> it back out.  Reading arbitrary Python code is of course extremely hairy,
> whereas the JSON dictionary could be defined to be a more limited and
> manageable subset.  Another possible advantage may be security related -- if
> you need to run untrusted plot code, you certainly don't want to be running
> untrusted Python code.
>>
>> I haven't found much discussion about serialization of matplotlib
>> figures, but I probably have not searched well enough, or maybe it is
>> not a high interest topic.  The discussion I have found seems to
>> suggest using the script you used to create the figure as the
>> serialization of that figure. To modify the figure, you modify the
>> script an rerun it.
>
> Yes -- that's the general consensus (at least among the core developers)
> when the discussion comes up.  There have been discussions and experiments
> using enthought.Traits that might make plots serializable and malleable, but
> it's a significant refactoring of matplotlib to take such an approach, for a
> fairly minor gain.  It's also extremely difficult to invent a serialization
> that would survive version upgrades to matplotlib.  One advantage of the
> script approach is that when APIs change in a backward-incompatible way, it
> is generally easier for end users to update their plots.  If plots were in a
> less human readable/writable format the changes required may be less
> obvious.
>>

I can see why an MPL-internal serialization capability would be low on
the priority list.  It's hard to do and no one is really asking for
it.  I don't think you were implying this, but just to be clear I am
not requesting a change to MPL or complaining about its functionality.
 Hope I didn't give the impression that I was.

Agreed that API changes could be difficult to deal with.

>>  What I would like to have (and what I have somee
>> very preliminary examples for) are versioned data structures that can
>> be converted to matplotlib figures without modifying any python source
>> code (other than the structured representation of the figure itself.)
>>  However, I don't know how much the matplotlib API changes, and an
>> approach like this may be very sensitive to those changes.
>>
>
> I don't understand the motivation to avoid modifying Python source code.  If
> you want to have common functionality that needs to change en masse, you can
> use Python functions in a library.  You could write Python scripts defining
> a plot that are nothing more than data and a single function call to said
> library.

I am not opposed to modifying python source code.  What I meant is
that I tried to separate the content of the figure from its
generation.  There is nothing python-specific about the content of a
figure.  To change a plot I change its content (as represented by a
python dict, XML, JSON, etc.), not the python code used to generate
the figure from the content.  I can add support for other MPL features
by changing the JSON, XML, python dict representation; I shouldn't
have to add server side python code to support additional MPL
features.

> Are you indexing the JSON at a fine-grained level in the couchdb, or are
> they ultimately just blobs anyway?  In which case a Python blob or a JSON
> blob should make no difference.
>

Good point, they will probably mostly be blobs, with some associated
metadata to query against.

> I'm not trying to dissuade you from creating a JSON frontend if there's a
> strong advantage.  But keeping that frontend in sync with the progress of
> matplotlib may be difficult, depending on how much coverage you want to
> provide.
>
> Mike

Understood, and thanks for the input.

Rich

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to