Re: [GRASS-dev] matplotlib example script
On Jul 25, 2008, at 6:56 PM, Glynn Clements wrote: Michael Barton wrote: ax.hist() and numpy.histogram() are broken by design. They should accept an iterator as an argument. Requiring the entire data to be passed as a list makes them useless for large amounts of data. Well. For *really* large amounts of data I suppose. And indeed sometimes people have Gb maps to work with. However, 15 sec. for histogramming a 30 million cell, 23Mb ASTER file isn't too bad. As you say, it would faster to use C modules in GRASS for binning if we had them. Your comments bring out a number of important points that we need to consider both in this particular case, and more in general too. Here are a few responses. Things can be written if there is a need for them. But doing things right is more important than doing things right now. I agree. This is a proposal accompanied by examples so that others can try it out. Sort of like Cairo. Many of the current problems with GRASS are due to cases where expediency was allowed to win out over good design (or *any* design). [As to the specific task of binning, you could probably get a suitable tool within 5 minutes by scavenging from r.quantile.] This would be good to add to r.stats, which already has an interface for this but which only works with float maps. Sure, we can provide basic functionality, but where do you draw the line? Which features of a real statistics package *wouldn't* be useful in GRASS? I'm really quite worried about the potential for feature creep in this area. I guess I'd leave it up to the user/developer base to decide what kinds of functionality we need. If it is something regularly necessary, someone will probably craft a script for it. If this is really useful, it might get translated into C-code. Visualization and spatial analysis is an important part of GIS functionality to me. In fact, a lot of graphing programs and stat packages would have difficulty in dealing with 20, 50, or 100 million points. MatPlotLib (or something like it) seems like a good tool to have available for development. It's also an encouragement for people to begin to develop scripts in Python. It's also far too low-level. If you start writing lots of scripts which call matplotlib directly, you will quickly end up with a situation where one script lets you set the axis colour and the label font but not the symbols, another lets you control the symbols but not the font, etc. It's not nearly as low level as d.graph or bash. Using pyplot or pylab dispenses with even more code. But it's important to see how this would work in a GRASS environment rather than just for creating general purpose graphs. This means it's up to the script/module/GUI developer to decide how much control is appropriate to pass on to users. [I'm using stylistic properties here for simplicity; there are probably other properties which are far more important, e.g. being able to zoom, select log/linear scaling, etc. If you can't find or aren't happy with existing graphing libraries, then write one. But don't require each script to include its own written-from-scratch-in-one-hour graphing library. My thought was that this could potentially serve as a 'standard' graphic library for Python scripting in GRASS. Depending on the graphic requirements, it could also serve for some of the 'built in' graphing functions in a wxPython GUI environment--that is, things built in to the GUI or modules that ship with GRASS. Obviously, we'd need to try it out and maybe look at other possible alternatives. This library is widely used and well-maintained, and has a lot of functionality. So it seems like a good candidate for something like this. What is required is something at a level where the script generates some data then indicates that the data should be displayed as a scatter plot. The library should take care of the rest (scales, sizes, colours, ...) I agree, especially for 'built-in' graphing like histogramming. without the script having to explicitly read options from the user and pass them to matplotlib. The idea is not that we should attempt to create general-purpose graphing applications, but to provide a flexible set of tools that GRASS developers and sophisticated users can use. Are you going to hard-code the fonts, colours, line width, symbols, etc, or allow the user to set them? I guess it depends on the application. Sometimes this is superfluous and other times it's very useful. Assuming the latter, are you going to force them to specify font= linecolor= textcolor= etc for every command that they type, or allow them to set defaults? Assuming the latter, how will the script obtain this information? The examples that I did gave the user virtually no choice over graph formatting or the type of graph displayed, although it is easy enough to build it into a script GUI. The existence of a library which
Re: [GRASS-dev] matplotlib example script
Hi Michael, I got a nice CASC2D water depth histogram, in a different color... It nicely work for other datasets... without NAN, which are different from NULL values, I wish we had a G_set_nan_null_value() function too... Yann 2008/7/26 Michael Barton [EMAIL PROTECTED]: Yann, Have you tried this on a dataset without NAN values? Michael __ C. Michael Barton, Professor of Anthropology Director of Graduate Studies School of Human Evolution Social Change Center for Social Dynamics Complexity Arizona State University Tempe, AZ 85287-2402 USA voice: 480-965-6262; fax: 480-965-7671 www: http://www.public.asu.edu/~cmbarton On Jul 25, 2008, at 4:09 PM, Yann Chemin wrote: Hi, after chmod +x, it created a nice histogram of elevation.10m. So I tried on floating point image created by some process. here is the problem: Traceback (most recent call last): File /home/yann/histogram_mpldemo.py, line 173, in module main() File /home/yann/histogram_mpldemo.py, line 159, in main alpha=0.75) File /usr/lib/python2.5/site-packages/matplotlib/axes.py, line 6035, in hist normed=bool(normed), new=True) File /usr/lib/python2.5/site-packages/numpy/lib/function_base.py, line 236, in histogram range = (a.min(), a.max()) ValueError: zero-size array to ufunc.reduce without identity this happens because the data has NAN (see r.info output): Range of data:min = nan max = nan Is there a way to discard NAN in a.min() and a.max() calculations? Or is there a NAN-resistant mode in matplotlib? Yann -- Yann Chemin International Rice Research Institute Office: http://www.irri.org/gis Perso: http://www.freewebs.com/ychemin attachment: myhistogram.png___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
Hi Michael, I tried it in Debian/Sid and here is what I got, a strange permission error Any idea/suggestion? -- GRASS 7.0.svn (spearfish60):~ python histogram_mpldemo.py input=elevation.10m bins=25 output=~/myhistogram fillcolor=20:20:200 start execl() failed: Permission denied GRASS 7.0.svn (spearfish60):~ -- thanks Yann 2008/7/25 Yann Chemin [EMAIL PROTECTED]: Hi Michael, Nice histogram! I am going to get python compiled and try it here... Yann -- Yann Chemin International Rice Research Institute Office: http://www.irri.org/gis Perso: http://www.freewebs.com/ychemin ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
Yann Chemin wrote: I tried it in Debian/Sid and here is what I got, a strange permission error Any idea/suggestion? chmod +x histogram_mpldemo.py ./histogram_mpldemo.py Even if you invoke the python interpreter explicitly, the script will run g.parser, which will re-execute the script directly, so it needs to be executable. This is true for any script which uses g.parser. -- Glynn Clements [EMAIL PROTECTED] ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
Michael Barton wrote: Even if it takes just as long, you're less likely to have it fail because plotlist consumed all available RAM. As it stands, plotlist will have one entry for every non-null cell in the raster. When processing bulk data, anything that uses a fixed amount of memory (e.g. one integer per bin) is preferable to using memory proportional to the size of the input. Hence the recommendation to iterate over the lines of r.stats' output rather than read it all into a list then iterate over the list. To do a histogram, I need to send ax.hist a list of values. So I don't know how I can get away without creating that list unless I use a completely different algorithm (something from numpy?). Don't use axes.hist(), use axes.bar(). I.e. leave the calculations to GRASS, and only use matplotlib for plotting. ax.hist() and numpy.histogram() are broken by design. They should accept an iterator as an argument. Requiring the entire data to be passed as a list makes them useless for large amounts of data. If we were to impose a requirement that maps must fit into memory, writing GRASS modules would be significantly simpler. GRASS would also be significantly less useful. On the other hand, in spite of recent improvements, d.hist is still pretty ugly, with formatting issues like varying font sizes on a single axis. And it is not very flexible. It would be nice to see where standard deviations lie, customize axis formatting, etc. For this module in particular, it is probably better to bin the data in another way than I have, and input it into one of the matplotlib plot methods. Actually, for statistical information, the most important feature is the ability output data in formats that are useful to *real* statistical software. There must be packages out there (even free ones) which will do a far better job than anything we are going to provide. Sure, we can provide basic functionality, but where do you draw the line? Which features of a real statistics package *wouldn't* be useful in GRASS? I'm really quite worried about the potential for feature creep in this area. And library of plotting functions isn't a statistics package. You want something where GRASS hands over the data and forgets about it. We shouldn't be responsible for communicating from the user to the software details such as what type of graph to draw, or the colours or symbols or whatever. -- Glynn Clements [EMAIL PROTECTED] ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
Hi, after chmod +x, it created a nice histogram of elevation.10m. So I tried on floating point image created by some process. here is the problem: Traceback (most recent call last): File /home/yann/histogram_mpldemo.py, line 173, in module main() File /home/yann/histogram_mpldemo.py, line 159, in main alpha=0.75) File /usr/lib/python2.5/site-packages/matplotlib/axes.py, line 6035, in hist normed=bool(normed), new=True) File /usr/lib/python2.5/site-packages/numpy/lib/function_base.py, line 236, in histogram range = (a.min(), a.max()) ValueError: zero-size array to ufunc.reduce without identity this happens because the data has NAN (see r.info output): Range of data:min = nan max = nan Is there a way to discard NAN in a.min() and a.max() calculations? Or is there a NAN-resistant mode in matplotlib? Yann ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
The code is pretty simple. I would guess that NAN data could be discarded when making the list that is histogrammed. I'm kind of surprised that it shows up as r.stats is run to discard nulls. But I guess these aren't seen as nulls by r.stats. Michael __ C. Michael Barton, Professor of Anthropology Director of Graduate Studies School of Human Evolution Social Change Center for Social Dynamics Complexity Arizona State University Tempe, AZ 85287-2402 USA voice: 480-965-6262; fax: 480-965-7671 www: http://www.public.asu.edu/~cmbarton On Jul 25, 2008, at 4:09 PM, Yann Chemin wrote: Hi, after chmod +x, it created a nice histogram of elevation.10m. So I tried on floating point image created by some process. here is the problem: Traceback (most recent call last): File /home/yann/histogram_mpldemo.py, line 173, in module main() File /home/yann/histogram_mpldemo.py, line 159, in main alpha=0.75) File /usr/lib/python2.5/site-packages/matplotlib/axes.py, line 6035, in hist normed=bool(normed), new=True) File /usr/lib/python2.5/site-packages/numpy/lib/function_base.py, line 236, in histogram range = (a.min(), a.max()) ValueError: zero-size array to ufunc.reduce without identity this happens because the data has NAN (see r.info output): Range of data:min = nan max = nan Is there a way to discard NAN in a.min() and a.max() calculations? Or is there a NAN-resistant mode in matplotlib? Yann ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
Yann, Have you tried this on a dataset without NAN values? Michael __ C. Michael Barton, Professor of Anthropology Director of Graduate Studies School of Human Evolution Social Change Center for Social Dynamics Complexity Arizona State University Tempe, AZ 85287-2402 USA voice: 480-965-6262; fax: 480-965-7671 www: http://www.public.asu.edu/~cmbarton On Jul 25, 2008, at 4:09 PM, Yann Chemin wrote: Hi, after chmod +x, it created a nice histogram of elevation.10m. So I tried on floating point image created by some process. here is the problem: Traceback (most recent call last): File /home/yann/histogram_mpldemo.py, line 173, in module main() File /home/yann/histogram_mpldemo.py, line 159, in main alpha=0.75) File /usr/lib/python2.5/site-packages/matplotlib/axes.py, line 6035, in hist normed=bool(normed), new=True) File /usr/lib/python2.5/site-packages/numpy/lib/function_base.py, line 236, in histogram range = (a.min(), a.max()) ValueError: zero-size array to ufunc.reduce without identity this happens because the data has NAN (see r.info output): Range of data:min = nan max = nan Is there a way to discard NAN in a.min() and a.max() calculations? Or is there a NAN-resistant mode in matplotlib? Yann ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
Michael Barton wrote: ax.hist() and numpy.histogram() are broken by design. They should accept an iterator as an argument. Requiring the entire data to be passed as a list makes them useless for large amounts of data. Well. For *really* large amounts of data I suppose. And indeed sometimes people have Gb maps to work with. However, 15 sec. for histogramming a 30 million cell, 23Mb ASTER file isn't too bad. As you say, it would faster to use C modules in GRASS for binning if we had them. Things can be written if there is a need for them. But doing things right is more important than doing things right now. Many of the current problems with GRASS are due to cases where expediency was allowed to win out over good design (or *any* design). [As to the specific task of binning, you could probably get a suitable tool within 5 minutes by scavenging from r.quantile.] Sure, we can provide basic functionality, but where do you draw the line? Which features of a real statistics package *wouldn't* be useful in GRASS? I'm really quite worried about the potential for feature creep in this area. I guess I'd leave it up to the user/developer base to decide what kinds of functionality we need. If it is something regularly necessary, someone will probably craft a script for it. If this is really useful, it might get translated into C-code. Visualization and spatial analysis is an important part of GIS functionality to me. In fact, a lot of graphing programs and stat packages would have difficulty in dealing with 20, 50, or 100 million points. MatPlotLib (or something like it) seems like a good tool to have available for development. It's also an encouragement for people to begin to develop scripts in Python. It's also far too low-level. If you start writing lots of scripts which call matplotlib directly, you will quickly end up with a situation where one script lets you set the axis colour and the label font but not the symbols, another lets you control the symbols but not the font, etc. [I'm using stylistic properties here for simplicity; there are probably other properties which are far more important, e.g. being able to zoom, select log/linear scaling, etc.] If you can't find or aren't happy with existing graphing libraries, then write one. But don't require each script to include its own written-from-scratch-in-one-hour graphing library. What is required is something at a level where the script generates some data then indicates that the data should be displayed as a scatter plot. The library should take care of the rest (scales, sizes, colours, ...) without the script having to explicitly read options from the user and pass them to matplotlib. Are you going to hard-code the fonts, colours, line width, symbols, etc, or allow the user to set them? Assuming the latter, are you going to force them to specify font= linecolor= textcolor= etc for every command that they type, or allow them to set defaults? Assuming the latter, how will the script obtain this information? Also, are these scripts going to be usable from within the GUI? (I.e. display the graphics in a window created by the GUI, not just dump a PNG file to the disk). How are the file format, dimensions, filename etc communicated between the two? Will the script be able to communicate the set of available display attributes to the GUI (in such a way that it can distinguish what to display from how to display it)? This, in a nutshell, is the difference between software engineering and coding. I've worked with too many coders; and by worked with, I mean cleaned up after. That isn't to say that you shouldn't start to write anything until you have a complete architectural design. Just that you should expect the first dozen or so attempts to serve as learning exercises rather than something which will eventually be used. And expect the first few attempts to tell you more about what *won't* work than what will. In many regards, trial-and-error can produce a better design than trying to operate from a purely theoretical perspective. Hindsight tends to be more accurate than foresight, albeit with the drawback that it takes rather more effort to obtain. -- Glynn Clements [EMAIL PROTECTED] ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
Yann Chemin wrote: this happens because the data has NAN (see r.info output): Range of data:min = nan max = nan Is there a way to discard NAN in a.min() and a.max() calculations? Or is there a NAN-resistant mode in matplotlib? NaN should never appear in GRASS rasters. When it does, it's a bug in the module which created the map (do you know which module was responsible in this case?). One of the things which is on my to-do list for 7.x is to change G_is_[fd]_null_value() to treat *all* NaN values as null (currently, it checks for a specific bit pattern, which is just one possible NaN value). Once that is done, then r.stats' -n flag will prevent NaNs from making it into the output. In the meantime, you can clean up maps which contain NaNs with e.g.: r.mapcalc 'newmap = if(oldmap == oldmap,oldmap,null())' or, with recent versions of r.null: r.null map setnull=nan -- Glynn Clements [EMAIL PROTECTED] ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
I guess the mailbot stripped out my small attachments. So here are links for those of you interested Script: http://www.public.asu.edu/~cmbarton/files/grass_scripts/histogram_mpldemo.py Example output: http://www.public.asu.edu/~cmbarton/files/grass_scripts/myhistogram.png Michael On Jul 24, 2008, at 9:12 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Date: Thu, 24 Jul 2008 09:12:41 -0700 From: Michael Barton [EMAIL PROTECTED] Subject: [GRASS-dev] matplotlib example script To: grass developers grass-dev@lists.osgeo.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=us-ascii I've attached an example script and output for using the MatPlotLib Python library in GRASS. This script replicates the functionality of d.histogram, but makes a much nicer looking plot. It also has more options, although I've included only a very few of the formatting possibilities here. The script sends the output from r.stats into 9 lines of code to produce the histogram with formatting options. Most of it is done by 4 lines. This can be run from the command line and does not require a GUI to be installed. It writes to a PNG file, but could easily be set to create a TclTk or wxPython interactive window as a user-selectable option. The following command produces the attached histogram: histogram_mpldemo.py input=elevation.10m bins=25 output=/users/ cmbarton/documents/myhistogram fillcolor=20:20:200 Maybe I'll try an xy graph of 2 maps/images with density plot options next. To run this, you need to install MatPlotLib http://matplotlib.sourceforge.net/ , which requires Numpy http://numpy.scipy.org/. Both are installable as binary eggs via the Python easy_install package manager or from most Linux package managers. Of course you need Python too. Enjoy Michael C. Michael Barton, Professor of Anthropology Director of Graduate Studies School of Human Evolution Social Change Center for Social Dynamics Complexity Arizona State University Phone: 480-965-6262 Fax: 480-965-7671 www: www.public.asu.edu/~cmbarton ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
Michael Barton wrote: I've attached an example script and output for using the MatPlotLib Python library in GRASS. One of the Apple Mail developers needs a dose of the clue-stick. The script (and image) were attached to the HTML alternative of the multipart/alternative section, so they don't exist for anyone using the plain-text alternative. This script replicates the functionality of d.histogram, but makes a much nicer looking plot. It also has more options, although I've included only a very few of the formatting possibilities here. The script sends the output from r.stats into 9 lines of code to produce the histogram with formatting options. Most of it is done by 4 lines. This can be run from the command line and does not require a GUI to be installed. It writes to a PNG file, but could easily be set to create a TclTk or wxPython interactive window as a user-selectable option. Before we start using MatPlotLib extensively, we need to figure out how to handle output format selection so that all script can work with any backend, and the scripts can be used alongside d.* commands without the caller needing to distingiush between them. Or, at least, we need to figure out an interface that will allow us to do that later without having to modify large numbers of scripts. IOW, somthing like a grass.plot module which hides the details behind begin() and end() methods, so the scripts themselves don't reference specific format types. A couple of brief comments on the code: if os.environ['PYTHONPATH'] != None: os.environ['PYTHONPATH'] = '/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages:'+os.environ['PYTHONPATH'] This doesn't belong here. mapstats = grass.read_command('r.stats', input = options['input'], fs = ',', nsteps = '255', flags = 'Acn') histlist = mapstats.splitlines() for pair in histlist: Ideally, read_command() shouldn't used for anything which might create a lot of output. For processing bulk output, iterate over stdout, with e.g.: mapstats_p = grass.start_command('r.stats', input = options['input'], fs = ',', nsteps = '255', flags = 'Acn', stdout = subprocess.PIPE) for pair in mapstats_p.stdout: try: pairlist = pair.strip().split(',') ... mapstats_p.wait() [Note the added strip(); the for line in file idiom doesn't strip the line terminator from the line.] As that's likely to be quite common, I have added grass.pipe_command(), which is like start_command() but adds the stdout = subprocess.PIPE automatically, so you can use: mapstats_p = grass.pipe_command('r.stats', input = options['input'], fs = ',', nsteps = '255', flags = 'Acn') for pair in mapstats_p.stdout: ... It returns the Popen object rather than the pipe itself, as you need the Popen object for the wait() call (even if you don't want the exit code, the child process will remain as a zombie until you call wait()). for i in range(cells): plotlist.append(val) Ouch. This is going to cause problems (i.e. fail) for large maps (even for moderately-sized maps, it will be slow). As it stands, this script *isn't* a substitute for d.histogram. As axes.hist() just calls numpy.histogram() then uses axes.bar() (or axes.barh()) to plot the result, you may as well just use axes.bar() directly on the value,count pairs emitted by r.stats. If you specifically want to bin the data, it would be better to either add an option to r.stats, or to bin the data in the script as it is being read, rather than read an entire raster map into a Python list. if len(fcolor.split(':')) == 3: #rgb color r = float(fcolor.split(':')[0]) g = float(fcolor.split(':')[1]) b = float(fcolor.split(':')[2]) #translate to mpl rgb format r = r / 255 g = g / 255 b = b / 255 fcolor = (r,g,b) This should probably be made into a library function; it would also need the named colours (lib/gis/named_colr.c). if options['linecolor'] != None: lcolor = options['linecolor'] else: fcolor = 'k' The else clause should presumably be lcolor. Does this not also need to handle R:G:B? -- Glynn Clements [EMAIL PROTECTED] ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] matplotlib example script
Thanks for the input Glynn. A few responses below. On Jul 24, 2008, at 12:24 PM, Glynn Clements wrote: Michael Barton wrote: I've attached an example script and output for using the MatPlotLib Python library in GRASS. One of the Apple Mail developers needs a dose of the clue-stick. The script (and image) were attached to the HTML alternative of the multipart/alternative section, so they don't exist for anyone using the plain-text alternative. Hmmm. I might be able to help that, but it's hard to remember to switch the way the program works between people here who expect html- like and plain text. This script replicates the functionality of d.histogram, but makes a much nicer looking plot. It also has more options, although I've included only a very few of the formatting possibilities here. The script sends the output from r.stats into 9 lines of code to produce the histogram with formatting options. Most of it is done by 4 lines. This can be run from the command line and does not require a GUI to be installed. It writes to a PNG file, but could easily be set to create a TclTk or wxPython interactive window as a user-selectable option. Before we start using MatPlotLib extensively, we need to figure out how to handle output format selection so that all script can work with any backend, and the scripts can be used alongside d.* commands without the caller needing to distingiush between them. Or, at least, we need to figure out an interface that will allow us to do that later without having to modify large numbers of scripts. IOW, somthing like a grass.plot module which hides the details behind begin() and end() methods, so the scripts themselves don't reference specific format types. I agree. One possibility is to use the ps backend if you are indeed planning to switch the output to postscript. In this example, I just wanted to show that TclTk, wxPython, Qt or other GUI is not required. A couple of brief comments on the code: if os.environ['PYTHONPATH'] != None: os.environ['PYTHONPATH'] = '/Library/Frameworks/Python.framework/ Versions/2.5/lib/python2.5/site-packages:'+os.environ['PYTHONPATH'] This doesn't belong here. Forgot to take this out. This is an issue that my system is having and not only does it not belong in this script, it doesn't help inside this script. mapstats = grass.read_command('r.stats', input = options['input'], fs = ',', nsteps = '255', flags = 'Acn') histlist = mapstats.splitlines() for pair in histlist: Ideally, read_command() shouldn't used for anything which might create a lot of output. For processing bulk output, iterate over stdout, with e.g.: mapstats_p = grass.start_command('r.stats', input = options['input'], fs = ',', nsteps = '255', flags = 'Acn', stdout = subprocess.PIPE) for pair in mapstats_p.stdout: try: pairlist = pair.strip().split(',') ... mapstats_p.wait() [Note the added strip(); the for line in file idiom doesn't strip the line terminator from the line.] The strip() is probably unnecessary; I have a habit of worrying about extraneous spaces causing formating problems. As that's likely to be quite common, I have added grass.pipe_command(), which is like start_command() but adds the stdout = subprocess.PIPE automatically, so you can use: mapstats_p = grass.pipe_command('r.stats', input = options['input'], fs = ',', nsteps = '255', flags = 'Acn') for pair in mapstats_p.stdout: ... It returns the Popen object rather than the pipe itself, as you need the Popen object for the wait() call (even if you don't want the exit code, the child process will remain as a zombie until you call wait()). Thanks. for i in range(cells): plotlist.append(val) Ouch. This is going to cause problems (i.e. fail) for large maps (even for moderately-sized maps, it will be slow). As it stands, this script *isn't* a substitute for d.histogram. Well, I was worried about that too. However, even with the 10m DEM in spearfish, the main time is spent in running r.stats. The rest of the code takes no appreciable time to run. I should try it on a big Terra/ ASTER or Landsat ETM file and see how long it takes. The fastest thing probably is to just read the GRASS histogram file. But this doesn't permit changing bin sizes or other plotting options. As axes.hist() just calls numpy.histogram() then uses axes.bar() (or axes.barh()) to plot the result, you may as well just use axes.bar() directly on the value,count pairs emitted by r.stats. If you specifically want to bin the data, it would be better to either add an option to r.stats, or to bin the data in the script as it is being read, rather than read an entire raster map into a Python list. I agree about r.stats. Wouldn't binning in
Re: [GRASS-dev] matplotlib example script
On Jul 24, 2008, at 6:03 PM, Glynn Clements wrote: Michael Barton wrote: This script replicates the functionality of d.histogram, but makes a much nicer looking plot. It also has more options, although I've included only a very few of the formatting possibilities here. The script sends the output from r.stats into 9 lines of code to produce the histogram with formatting options. Most of it is done by 4 lines. This can be run from the command line and does not require a GUI to be installed. It writes to a PNG file, but could easily be set to create a TclTk or wxPython interactive window as a user-selectable option. Before we start using MatPlotLib extensively, we need to figure out how to handle output format selection so that all script can work with any backend, and the scripts can be used alongside d.* commands without the caller needing to distingiush between them. Or, at least, we need to figure out an interface that will allow us to do that later without having to modify large numbers of scripts. IOW, somthing like a grass.plot module which hides the details behind begin() and end() methods, so the scripts themselves don't reference specific format types. I agree. One possibility is to use the ps backend if you are indeed planning to switch the output to postscript. In this example, I just wanted to show that TclTk, wxPython, Qt or other GUI is not required. Well, the general idea is to support as wide a range of backends as possible. Essentially, we want something similar to the d.* commands, which will output to a variety of backends, even those which are yet to be invented. Also, if we can't find any better way to integrate d.* commands and matplotlib, we may end up adding a d.graph backend to matplotlib. for i in range(cells): plotlist.append(val) Ouch. This is going to cause problems (i.e. fail) for large maps (even for moderately-sized maps, it will be slow). As it stands, this script *isn't* a substitute for d.histogram. Well, I was worried about that too. However, even with the 10m DEM in spearfish, the main time is spent in running r.stats. The rest of the code takes no appreciable time to run. I should try it on a big Terra/ ASTER or Landsat ETM file and see how long it takes. Here: $ g.region rast=elevation.10m $ time r.stats -Acn elevation.10m /dev/null r.stats complete. real0m1.233s user0m1.220s sys 0m0.013s time stuff/histogram_mpldemo.py input=elevation.10m r.stats complete. real0m5.058s user0m4.886s sys 0m0.163s Note that the time taken to get to the r.stats complete point includes the above loop to insert cells copies of val in plotlist. Also, r.stats alone is only using one core, while the script will be using both (one running the script, another running r.stats concurrently). With a single core (or if the other core(s) are busy), it would be worse. As axes.hist() just calls numpy.histogram() then uses axes.bar() (or axes.barh()) to plot the result, you may as well just use axes.bar() directly on the value,count pairs emitted by r.stats. If you specifically want to bin the data, it would be better to either add an option to r.stats, or to bin the data in the script as it is being read, rather than read an entire raster map into a Python list. I agree about r.stats. Wouldn't binning in the script take as long as the current method? Even if it takes just as long, you're less likely to have it fail because plotlist consumed all available RAM. As it stands, plotlist will have one entry for every non-null cell in the raster. When processing bulk data, anything that uses a fixed amount of memory (e.g. one integer per bin) is preferable to using memory proportional to the size of the input. Hence the recommendation to iterate over the lines of r.stats' output rather than read it all into a list then iterate over the list. I'll give that a try. if len(fcolor.split(':')) == 3: #rgb color r = float(fcolor.split(':')[0]) g = float(fcolor.split(':')[1]) b = float(fcolor.split(':')[2]) #translate to mpl rgb format r = r / 255 g = g / 255 b = b / 255 fcolor = (r,g,b) This should probably be made into a library function; it would also need the named colours (lib/gis/named_colr.c). Yes. And it might be one in MatPlotLib. It has some color function, but I haven't explored them yet. Do the grass color names have html equivalents? If so, we don't need /lib/gis/named_colr.c. I suspect that we want our own code here. Eventually there may be Python scripts which don't use matplotlib (i.e. non-d.* scripts) but which still need to parse colours. I've added this to grass.py: def parse_color(val, dflt = None): if val in named_colors: return named_colors[val] vals = val.split(':') if len(vals) == 3:
Re: [GRASS-dev] matplotlib example script
On Jul 24, 2008, at 6:03 PM, Glynn Clements wrote: Even if it takes just as long, you're less likely to have it fail because plotlist consumed all available RAM. As it stands, plotlist will have one entry for every non-null cell in the raster. When processing bulk data, anything that uses a fixed amount of memory (e.g. one integer per bin) is preferable to using memory proportional to the size of the input. Hence the recommendation to iterate over the lines of r.stats' output rather than read it all into a list then iterate over the list. To do a histogram, I need to send ax.hist a list of values. So I don't know how I can get away without creating that list unless I use a completely different algorithm (something from numpy?). This probably isn't a very good example, as there may be other more efficient ways to get the data to actually create the histogram. It's more a way to try this out and see if it is useful--and what needs to be done to use it. On the other hand, in spite of recent improvements, d.hist is still pretty ugly, with formatting issues like varying font sizes on a single axis. And it is not very flexible. It would be nice to see where standard deviations lie, customize axis formatting, etc. For this module in particular, it is probably better to bin the data in another way than I have, and input it into one of the matplotlib plot methods. Here are the results of trying this on a larger image (Terra/ASTER) with almost 30 million cells. The script takes about 2.5X as long as d.histogram (which surprising to me is also using r.stats internally). It still isn't bad at 15 seconds for an unoptimized first shot. GRASS 7.0.svn (Spain_wgs84z30):~ r.info ASTER.0536.vnir.3 + + | Layer:ASTER.0536.vnir.3 Date: Sun Aug 17 23:08:30 2003| | Mapset: aster Login of Creator: cmbarton| | Location: Spain_wgs84z30 | | DataBase: /Users/Shared/ grassdata | | Title: ( ASTER.vnir. 0536.3 ) | | Timestamp: none| | | | | | Type of Map: raster Number of Categories: 252 | | Data Type: CELL | | Rows: 5175 | | Columns: 5787 | | Total Cells: 29947725 | |Projection: UTM (zone 30) | |N: 4423719.5S: 4346094.5 Res: 15 | |E: 774409.5W: 687604.5 Res: 15 | | Range of data:min = 5 max = 252 | | | | Data Description:| |generated by i.in.erdas | | | | | + + GRASS 7.0.svn (Spain_wgs84z30):~ time r.stats -Acn ASTER.0536.vnir. 3 /dev/null 100% r.stats complete. real0m2.774s user0m2.418s sys 0m0.122s GRASS 7.0.svn (Spain_wgs84z30):~ time histogram_mpldemo.py input=ASTER.0536.vnir.3 output=testaster 100% r.stats complete. real0m14.957s user0m11.938s sys 0m1.443s GRASS 7.0.svn (Spain_wgs84z30):~ time d.histogram ASTER.0536.vnir.3 /dev/null 100% r.stats complete. real0m5.811s user0m2.440s sys 0m0.144s Michael ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev