On 09/12/2010 07:10 AM, Jouni K. Seppänen wrote: > A while ago there was a discussion [1] about how using the > get_sample_data function in building the documentation is a problem for > Debian packagers. Let me see if I understand the goals of > get_sample_data correctly: > > * we want to enable users to run examples they find in the gallery > without downloading extra files; > > * we don't want to package all the sample data with matplotlib, either > because it is too large, or because it changes more often than we > release new versions. >
* Also, we want to have the sample data not to be in the same version control repository as MPL proper so that when we download the MPL source code itself, we don't get the sample data. (This is one of the sticking points for a move to git.) > Here's what I suggest: > > 1. Package the sample data in a separate zip file that users can > download and expand in e.g. ~/.matplotlib/sample_data if they like. > This file could be released more often than matplotlib, if needed. > Debian can use this as one source file and package it as a separate > deb file. > > 2. Make get_sample_data look first in the place where the zip file could > have been expanded, and only if the required file is not found, try > to obtain it from the web. Add an option to disable the network > access. This is different from what we do now, because now > get_sample_data always tries to check if there is a newer version > available, which apparently doesn't work reliably on unconnected > computers. > > 3. To make this work, agree that sample data files are immutable: if a > new version is needed, it needs to have a new name (and thus the > examples using it need to be updated). The files have not been > changed a lot [2], so I don't think this is very much of a burden. > > What do you think? > > #1 and #2 seem reasonable to me. I don't like #3 -- for the same reasons as we want to separate the rest of the sample data (smaller download, smaller repository, and separation of code and non-essential data), I think the test comparison images should be with the sample data. Having to deal with renames in the tests would be annoying. Two alternative ideas to handle for the versioning issue: A) Add a .py file in the main source repository with is a list of sample data filenames and checksums. If a sample data file doesn't exist, or its checksum is wrong, it can be downloaded. B) The source file could simply have the same data version number required and the sample data itself could be versioned. ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel