Re: [matplotlib-devel] Sample data: a proposal

Andrew Straw Sun, 12 Sep 2010 08:31:16 -0700

On 09/12/2010 07:10 AM, Jouni K. Seppänen wrote:
> A while ago there was a discussion [1] about how using the
> get_sample_data function in building the documentation is a problem for
> Debian packagers. Let me see if I understand the goals of
> get_sample_data correctly:
>
> * we want to enable users to run examples they find in the gallery
>    without downloading extra files;
>
> * we don't want to package all the sample data with matplotlib, either
>    because it is too large, or because it changes more often than we
>    release new versions.
>


* Also, we want to have the sample data not to be in the same version 
control repository as MPL proper so that when we download the MPL source 
code itself, we don't get the sample data. (This is one of the sticking 
points for a move to git.)

> Here's what I suggest:
>
> 1. Package the sample data in a separate zip file that users can
>     download and expand in e.g. ~/.matplotlib/sample_data if they like.
>     This file could be released more often than matplotlib, if needed.
>     Debian can use this as one source file and package it as a separate
>     deb file.
>
> 2. Make get_sample_data look first in the place where the zip file could
>     have been expanded, and only if the required file is not found, try
>     to obtain it from the web. Add an option to disable the network
>     access. This is different from what we do now, because now
>     get_sample_data always tries to check if there is a newer version
>     available, which apparently doesn't work reliably on unconnected
>     computers.
>
> 3. To make this work, agree that sample data files are immutable: if a
>     new version is needed, it needs to have a new name (and thus the
>     examples using it need to be updated). The files have not been
>     changed a lot [2], so I don't think this is very much of a burden.
>
> What do you think?
>
>    

#1 and #2 seem reasonable to me.

I don't like #3 -- for the same reasons as we want to separate the rest 
of the sample data (smaller download, smaller repository, and separation 
of code and non-essential data), I think the test comparison images 
should be with the sample data. Having to deal with renames in the tests 
would be annoying. Two alternative ideas to handle for the  versioning 
issue: A) Add a .py file in the main source repository with is a list of 
sample data filenames and checksums. If a sample data file doesn't 
exist, or its checksum is wrong, it can be downloaded. B) The source 
file could simply have the same data version number required and the 
sample data itself could be versioned.

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Re: [matplotlib-devel] Sample data: a proposal

Reply via email to