Re: [GRASS-dev] matplotlib example script

2008-07-26 Thread Michael Barton


On Jul 25, 2008, at 6:56 PM, Glynn Clements wrote:



Michael Barton wrote:


ax.hist() and numpy.histogram() are broken by design. They should
accept an iterator as an argument. Requiring the entire data to be
passed as a list makes them useless for large amounts of data.


Well. For *really* large amounts of data I suppose. And indeed
sometimes people have Gb maps to work with. However, 15 sec. for
histogramming a 30 million cell, 23Mb ASTER file isn't too bad. As  
you

say, it would faster to use C modules in GRASS for binning if we had
them.



Your comments bring out a number of important points that we need to  
consider both in this particular case, and more in general too. Here  
are a few responses.






Things can be written if there is a need for them. But doing things
right is more important than doing things right now.


I agree. This is a proposal accompanied by examples so that others can  
try it out. Sort of like Cairo.





Many of the current problems with GRASS are due to cases where
expediency was allowed to win out over good design (or *any* design).

[As to the specific task of binning, you could probably get a suitable
tool within 5 minutes by scavenging from r.quantile.]


This would be good to add to r.stats, which already has an interface  
for this but which only works with float maps.





Sure, we can provide basic functionality, but where do you draw  
the

line? Which features of a real statistics package *wouldn't* be
useful in GRASS? I'm really quite worried about the potential for
feature creep in this area.


I guess I'd leave it up to the user/developer base to decide what
kinds of functionality we need. If it is something regularly
necessary, someone will probably craft a script for it. If this is
really useful, it might get translated into  C-code. Visualization  
and

spatial analysis is an important part of GIS functionality to me. In
fact, a lot of graphing programs and stat packages would have
difficulty in dealing with 20, 50, or 100 million points. MatPlotLib
(or something like it) seems like a good tool to have available for
development. It's also an encouragement for people to begin to  
develop

scripts in Python.


It's also far too low-level. If you start writing lots of scripts
which call matplotlib directly, you will quickly end up with a
situation where one script lets you set the axis colour and the label
font but not the symbols, another lets you control the symbols but not
the font, etc.


It's not nearly as low level as d.graph or bash. Using pyplot or pylab  
dispenses with even more code. But it's important to see how this  
would work in a GRASS environment rather than just for creating  
general purpose graphs. This means it's up to the script/module/GUI  
developer to decide how much control is appropriate to pass on to users.





[I'm using stylistic properties here for simplicity; there are
probably other properties which are far more important, e.g. being
able to zoom, select log/linear scaling, etc.


If you can't find or aren't happy with existing graphing libraries,
then write one. But don't require each script to include its own
written-from-scratch-in-one-hour graphing library.


My thought was that this could potentially serve as a 'standard'  
graphic library for Python scripting in GRASS. Depending on the  
graphic requirements, it could also serve for some of the 'built in'  
graphing functions in a wxPython GUI environment--that is, things  
built in to the GUI or modules that ship with GRASS. Obviously, we'd  
need to try it out and maybe look at other possible alternatives. This  
library is widely used and well-maintained, and has a lot of  
functionality. So it seems like a good candidate for something like  
this.





What is required is something at a level where the script generates
some data then indicates that the data should be displayed as a
scatter plot. The library should take care of the rest (scales, sizes,
colours, ...)


I agree, especially for 'built-in' graphing like histogramming.



without the script having to explicitly read options
from the user and pass them to matplotlib.


The idea is not that we should attempt to create general-purpose  
graphing applications, but to provide a flexible set of tools that  
GRASS  developers and sophisticated users can use.





Are you going to hard-code the fonts, colours, line width, symbols,
etc, or allow the user to set them?


I guess it depends on the application. Sometimes this is superfluous  
and other times it's very useful.



Assuming the latter, are you going
to force them to specify font= linecolor= textcolor= etc for every
command that they type, or allow them to set defaults? Assuming the
latter, how will the script obtain this information?


The examples that I did gave the user virtually no choice over graph  
formatting or the type of graph displayed, although it is easy enough  
to build it into a script GUI. The existence of a library which 

Re: [GRASS-dev] matplotlib example script

2008-07-26 Thread Yann Chemin
Hi Michael,

I got a nice CASC2D water depth histogram, in a different color...
It nicely work for other datasets... without NAN, which are different
from NULL values, I wish we had a G_set_nan_null_value() function
too...

Yann


2008/7/26 Michael Barton [EMAIL PROTECTED]:
 Yann,

 Have you tried this on a dataset without NAN values?

 Michael
 __
 C. Michael Barton, Professor of Anthropology
 Director of Graduate Studies
 School of Human Evolution  Social Change
 Center for Social Dynamics  Complexity
 Arizona State University
 Tempe, AZ  85287-2402
 USA

 voice: 480-965-6262; fax: 480-965-7671
 www: http://www.public.asu.edu/~cmbarton

 On Jul 25, 2008, at 4:09 PM, Yann Chemin wrote:

 Hi,

 after chmod +x, it created a nice histogram of elevation.10m.
 So I tried on floating point image created by some process.

 here is the problem:
 Traceback (most recent call last):
  File /home/yann/histogram_mpldemo.py, line 173, in module
   main()
  File /home/yann/histogram_mpldemo.py, line 159, in main
   alpha=0.75)
  File /usr/lib/python2.5/site-packages/matplotlib/axes.py, line 6035, in
 hist
   normed=bool(normed), new=True)
  File /usr/lib/python2.5/site-packages/numpy/lib/function_base.py,
 line 236, in histogram
   range = (a.min(), a.max())
 ValueError: zero-size array to ufunc.reduce without identity


 this happens because the data has NAN (see r.info output):
 Range of data:min = nan  max = nan

 Is there a way to discard NAN in a.min() and a.max() calculations?
 Or is there a NAN-resistant mode in matplotlib?

 Yann





-- 
Yann Chemin
International Rice Research Institute
Office: http://www.irri.org/gis
Perso: http://www.freewebs.com/ychemin
attachment: myhistogram.png___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] matplotlib example script

2008-07-25 Thread Yann Chemin
Hi Michael,

I tried it in Debian/Sid and here is what I got, a strange permission error

Any idea/suggestion?

--

GRASS 7.0.svn (spearfish60):~  python histogram_mpldemo.py
input=elevation.10m bins=25 output=~/myhistogram fillcolor=20:20:200
start
execl() failed: Permission denied
GRASS 7.0.svn (spearfish60):~ 

--

thanks
Yann

2008/7/25 Yann Chemin [EMAIL PROTECTED]:
 Hi Michael,

 Nice histogram!
 I am going to get python compiled and try it here...
 Yann




-- 
Yann Chemin
International Rice Research Institute
Office: http://www.irri.org/gis
Perso: http://www.freewebs.com/ychemin
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] matplotlib example script

2008-07-25 Thread Glynn Clements

Yann Chemin wrote:

 I tried it in Debian/Sid and here is what I got, a strange permission 
 error
 
 Any idea/suggestion?

chmod +x histogram_mpldemo.py
./histogram_mpldemo.py

Even if you invoke the python interpreter explicitly, the script will
run g.parser, which will re-execute the script directly, so it needs
to be executable. This is true for any script which uses g.parser.

-- 
Glynn Clements [EMAIL PROTECTED]
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] matplotlib example script

2008-07-25 Thread Glynn Clements

Michael Barton wrote:

  Even if it takes just as long, you're less likely to have it fail
  because plotlist consumed all available RAM. As it stands, plotlist
  will have one entry for every non-null cell in the raster.
 
  When processing bulk data, anything that uses a fixed amount of
  memory (e.g. one integer per bin) is preferable to using memory
  proportional to the size of the input.
 
  Hence the recommendation to iterate over the lines of r.stats' output
  rather than read it all into a list then iterate over the list.
 
 To do a histogram, I need to send ax.hist a list of values. So I don't  
 know how I can get away without creating that list unless I use a  
 completely different algorithm (something from numpy?).

Don't use axes.hist(), use axes.bar(). I.e. leave the calculations to
GRASS, and only use matplotlib for plotting.

ax.hist() and numpy.histogram() are broken by design. They should
accept an iterator as an argument. Requiring the entire data to be
passed as a list makes them useless for large amounts of data.

If we were to impose a requirement that maps must fit into memory,
writing GRASS modules would be significantly simpler. GRASS would also
be significantly less useful.

 On the other hand, in spite of recent improvements, d.hist is still  
 pretty ugly, with formatting issues like varying font sizes on a  
 single axis. And it is not very flexible. It would be nice to see  
 where standard deviations lie, customize axis formatting, etc. For  
 this module in particular, it is probably better to bin the data in  
 another way than I have, and input it into one of the matplotlib plot  
 methods.

Actually, for statistical information, the most important feature is
the ability output data in formats that are useful to *real*
statistical software. There must be packages out there (even free
ones) which will do a far better job than anything we are going to
provide.

Sure, we can provide basic functionality, but where do you draw the
line? Which features of a real statistics package *wouldn't* be
useful in GRASS? I'm really quite worried about the potential for
feature creep in this area.

And library of plotting functions isn't a statistics package. You want
something where GRASS hands over the data and forgets about it. We
shouldn't be responsible for communicating from the user to the
software details such as what type of graph to draw, or the colours or
symbols or whatever.

-- 
Glynn Clements [EMAIL PROTECTED]
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] matplotlib example script

2008-07-25 Thread Yann Chemin
Hi,

after chmod +x, it created a nice histogram of elevation.10m.
So I tried on floating point image created by some process.

here is the problem:
Traceback (most recent call last):
  File /home/yann/histogram_mpldemo.py, line 173, in module
main()
  File /home/yann/histogram_mpldemo.py, line 159, in main
alpha=0.75)
  File /usr/lib/python2.5/site-packages/matplotlib/axes.py, line 6035, in hist
normed=bool(normed), new=True)
  File /usr/lib/python2.5/site-packages/numpy/lib/function_base.py,
line 236, in histogram
range = (a.min(), a.max())
ValueError: zero-size array to ufunc.reduce without identity


this happens because the data has NAN (see r.info output):
 Range of data:min = nan  max = nan

Is there a way to discard NAN in a.min() and a.max() calculations?
Or is there a NAN-resistant mode in matplotlib?

Yann
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] matplotlib example script

2008-07-25 Thread Michael Barton
The code is pretty simple. I would guess that NAN data could be  
discarded when making the list that is histogrammed. I'm kind of  
surprised that it shows up as r.stats is run to discard nulls. But I  
guess these aren't seen as nulls by r.stats.


Michael
__
C. Michael Barton, Professor of Anthropology
Director of Graduate Studies
School of Human Evolution  Social Change
Center for Social Dynamics  Complexity
Arizona State University
Tempe, AZ  85287-2402
USA

voice: 480-965-6262; fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

On Jul 25, 2008, at 4:09 PM, Yann Chemin wrote:


Hi,

after chmod +x, it created a nice histogram of elevation.10m.
So I tried on floating point image created by some process.

here is the problem:
Traceback (most recent call last):
 File /home/yann/histogram_mpldemo.py, line 173, in module
   main()
 File /home/yann/histogram_mpldemo.py, line 159, in main
   alpha=0.75)
 File /usr/lib/python2.5/site-packages/matplotlib/axes.py, line  
6035, in hist

   normed=bool(normed), new=True)
 File /usr/lib/python2.5/site-packages/numpy/lib/function_base.py,
line 236, in histogram
   range = (a.min(), a.max())
ValueError: zero-size array to ufunc.reduce without identity


this happens because the data has NAN (see r.info output):
Range of data:min = nan  max = nan

Is there a way to discard NAN in a.min() and a.max() calculations?
Or is there a NAN-resistant mode in matplotlib?

Yann


___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] matplotlib example script

2008-07-25 Thread Michael Barton

Yann,

Have you tried this on a dataset without NAN values?

Michael
__
C. Michael Barton, Professor of Anthropology
Director of Graduate Studies
School of Human Evolution  Social Change
Center for Social Dynamics  Complexity
Arizona State University
Tempe, AZ  85287-2402
USA

voice: 480-965-6262; fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

On Jul 25, 2008, at 4:09 PM, Yann Chemin wrote:


Hi,

after chmod +x, it created a nice histogram of elevation.10m.
So I tried on floating point image created by some process.

here is the problem:
Traceback (most recent call last):
 File /home/yann/histogram_mpldemo.py, line 173, in module
   main()
 File /home/yann/histogram_mpldemo.py, line 159, in main
   alpha=0.75)
 File /usr/lib/python2.5/site-packages/matplotlib/axes.py, line  
6035, in hist

   normed=bool(normed), new=True)
 File /usr/lib/python2.5/site-packages/numpy/lib/function_base.py,
line 236, in histogram
   range = (a.min(), a.max())
ValueError: zero-size array to ufunc.reduce without identity


this happens because the data has NAN (see r.info output):
Range of data:min = nan  max = nan

Is there a way to discard NAN in a.min() and a.max() calculations?
Or is there a NAN-resistant mode in matplotlib?

Yann


___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] matplotlib example script

2008-07-25 Thread Glynn Clements

Michael Barton wrote:

  ax.hist() and numpy.histogram() are broken by design. They should
  accept an iterator as an argument. Requiring the entire data to be
  passed as a list makes them useless for large amounts of data.
 
 Well. For *really* large amounts of data I suppose. And indeed  
 sometimes people have Gb maps to work with. However, 15 sec. for  
 histogramming a 30 million cell, 23Mb ASTER file isn't too bad. As you  
 say, it would faster to use C modules in GRASS for binning if we had  
 them.

Things can be written if there is a need for them. But doing things
right is more important than doing things right now.

Many of the current problems with GRASS are due to cases where
expediency was allowed to win out over good design (or *any* design).

[As to the specific task of binning, you could probably get a suitable
tool within 5 minutes by scavenging from r.quantile.]

  Sure, we can provide basic functionality, but where do you draw the
  line? Which features of a real statistics package *wouldn't* be
  useful in GRASS? I'm really quite worried about the potential for
  feature creep in this area.
 
 I guess I'd leave it up to the user/developer base to decide what  
 kinds of functionality we need. If it is something regularly  
 necessary, someone will probably craft a script for it. If this is  
 really useful, it might get translated into  C-code. Visualization and  
 spatial analysis is an important part of GIS functionality to me. In  
 fact, a lot of graphing programs and stat packages would have  
 difficulty in dealing with 20, 50, or 100 million points. MatPlotLib  
 (or something like it) seems like a good tool to have available for  
 development. It's also an encouragement for people to begin to develop  
 scripts in Python.

It's also far too low-level. If you start writing lots of scripts
which call matplotlib directly, you will quickly end up with a
situation where one script lets you set the axis colour and the label
font but not the symbols, another lets you control the symbols but not
the font, etc.

[I'm using stylistic properties here for simplicity; there are
probably other properties which are far more important, e.g. being
able to zoom, select log/linear scaling, etc.]

If you can't find or aren't happy with existing graphing libraries,
then write one. But don't require each script to include its own
written-from-scratch-in-one-hour graphing library.

What is required is something at a level where the script generates
some data then indicates that the data should be displayed as a
scatter plot. The library should take care of the rest (scales, sizes,
colours, ...) without the script having to explicitly read options
from the user and pass them to matplotlib.

Are you going to hard-code the fonts, colours, line width, symbols,
etc, or allow the user to set them? Assuming the latter, are you going
to force them to specify font= linecolor= textcolor= etc for every
command that they type, or allow them to set defaults? Assuming the
latter, how will the script obtain this information?

Also, are these scripts going to be usable from within the GUI? (I.e. 
display the graphics in a window created by the GUI, not just dump a
PNG file to the disk). How are the file format, dimensions, filename
etc communicated between the two?

Will the script be able to communicate the set of available display
attributes to the GUI (in such a way that it can distinguish what to
display from how to display it)?

This, in a nutshell, is the difference between software engineering
and coding. I've worked with too many coders; and by worked
with, I mean cleaned up after.

That isn't to say that you shouldn't start to write anything until you
have a complete architectural design. Just that you should expect the
first dozen or so attempts to serve as learning exercises rather than
something which will eventually be used. And expect the first few
attempts to tell you more about what *won't* work than what will.

In many regards, trial-and-error can produce a better design than
trying to operate from a purely theoretical perspective. Hindsight
tends to be more accurate than foresight, albeit with the drawback
that it takes rather more effort to obtain.

-- 
Glynn Clements [EMAIL PROTECTED]
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] matplotlib example script

2008-07-25 Thread Glynn Clements

Yann Chemin wrote:

 this happens because the data has NAN (see r.info output):
  Range of data:min = nan  max = nan
 
 Is there a way to discard NAN in a.min() and a.max() calculations?
 Or is there a NAN-resistant mode in matplotlib?

NaN should never appear in GRASS rasters. When it does, it's a bug in
the module which created the map (do you know which module was
responsible in this case?).

One of the things which is on my to-do list for 7.x is to change
G_is_[fd]_null_value() to treat *all* NaN values as null (currently,
it checks for a specific bit pattern, which is just one possible NaN
value).

Once that is done, then r.stats' -n flag will prevent NaNs from making
it into the output.

In the meantime, you can clean up maps which contain NaNs with e.g.:

r.mapcalc 'newmap = if(oldmap == oldmap,oldmap,null())'

or, with recent versions of r.null:

r.null map setnull=nan

-- 
Glynn Clements [EMAIL PROTECTED]
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] matplotlib example script

2008-07-24 Thread Michael Barton
I guess the mailbot stripped out my small attachments. So here are  
links for those of you interested


Script: 
http://www.public.asu.edu/~cmbarton/files/grass_scripts/histogram_mpldemo.py
Example output: 
http://www.public.asu.edu/~cmbarton/files/grass_scripts/myhistogram.png

Michael

On Jul 24, 2008, at 9:12 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] 
 wrote:



Date: Thu, 24 Jul 2008 09:12:41 -0700
From: Michael Barton [EMAIL PROTECTED]
Subject: [GRASS-dev] matplotlib example script
To: grass developers grass-dev@lists.osgeo.org
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain; charset=us-ascii

I've attached an example script and output for using the MatPlotLib
Python library in GRASS. This script replicates the functionality of
d.histogram, but makes a much nicer looking plot. It also has more
options, although I've included only a very few of the formatting
possibilities here. The script sends the output from r.stats into 9
lines of code to produce the histogram with formatting options. Most
of it is done by 4 lines. This can be run from the command line and
does not require a GUI to be installed. It writes to a PNG file, but
could easily be set to create a TclTk or wxPython interactive window
as a user-selectable option.

The following command produces the attached histogram:

histogram_mpldemo.py input=elevation.10m bins=25 output=/users/
cmbarton/documents/myhistogram fillcolor=20:20:200

Maybe I'll try an xy graph of 2 maps/images with density plot options
next.

To run this, you need to install MatPlotLib http://matplotlib.sourceforge.net/

, which requires Numpy http://numpy.scipy.org/. Both are

installable as binary eggs via the Python easy_install package manager
or from most Linux package managers. Of course you need Python too.

Enjoy
Michael

C. Michael Barton, Professor of Anthropology
Director of Graduate Studies
School of Human Evolution  Social Change
Center for Social Dynamics  Complexity
Arizona State University

Phone: 480-965-6262
Fax: 480-965-7671
www: www.public.asu.edu/~cmbarton


___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] matplotlib example script

2008-07-24 Thread Glynn Clements

Michael Barton wrote:

 I've attached an example script and output for using the MatPlotLib  
 Python library in GRASS.

One of the Apple Mail developers needs a dose of the clue-stick. The
script (and image) were attached to the HTML alternative of the
multipart/alternative section, so they don't exist for anyone using
the plain-text alternative.

 This script replicates the functionality of  
 d.histogram, but makes a much nicer looking plot. It also has more  
 options, although I've included only a very few of the formatting  
 possibilities here. The script sends the output from r.stats into 9  
 lines of code to produce the histogram with formatting options. Most  
 of it is done by 4 lines. This can be run from the command line and  
 does not require a GUI to be installed. It writes to a PNG file, but  
 could easily be set to create a TclTk or wxPython interactive window  
 as a user-selectable option.

Before we start using MatPlotLib extensively, we need to figure out
how to handle output format selection so that all script can work with
any backend, and the scripts can be used alongside d.* commands
without the caller needing to distingiush between them.

Or, at least, we need to figure out an interface that will allow us to
do that later without having to modify large numbers of scripts. IOW,
somthing like a grass.plot module which hides the details behind
begin() and end() methods, so the scripts themselves don't reference
specific format types.

A couple of brief comments on the code:

 if os.environ['PYTHONPATH'] != None:
 os.environ['PYTHONPATH'] = 
 '/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages:'+os.environ['PYTHONPATH']

This doesn't belong here.

 mapstats = grass.read_command('r.stats', input = options['input'], fs = 
 ',',
 nsteps = '255', flags = 'Acn')
 histlist = mapstats.splitlines()
 for pair in histlist:

Ideally, read_command() shouldn't used for anything which might create
a lot of output.

For processing bulk output, iterate over stdout, with e.g.:

mapstats_p = grass.start_command('r.stats', input = options['input'], 
fs = ',',
  nsteps = '255', flags = 'Acn', stdout 
= subprocess.PIPE)
for pair in mapstats_p.stdout:
try:
pairlist = pair.strip().split(',')
...

mapstats_p.wait()

[Note the added strip(); the for line in file idiom doesn't strip the
line terminator from the line.]

As that's likely to be quite common, I have added grass.pipe_command(),
which is like start_command() but adds the stdout = subprocess.PIPE
automatically, so you can use:

mapstats_p = grass.pipe_command('r.stats', input = options['input'], fs 
= ',',
  nsteps = '255', flags = 'Acn')
for pair in mapstats_p.stdout:
...

It returns the Popen object rather than the pipe itself, as you need
the Popen object for the wait() call (even if you don't want the exit
code, the child process will remain as a zombie until you call wait()).

 for i in range(cells):
 plotlist.append(val)

Ouch. This is going to cause problems (i.e. fail) for large maps (even
for moderately-sized maps, it will be slow). As it stands, this script
*isn't* a substitute for d.histogram.

As axes.hist() just calls numpy.histogram() then uses axes.bar() (or
axes.barh()) to plot the result, you may as well just use axes.bar()
directly on the value,count pairs emitted by r.stats.

If you specifically want to bin the data, it would be better to either
add an option to r.stats, or to bin the data in the script as it is
being read, rather than read an entire raster map into a Python list.

 if len(fcolor.split(':')) == 3:
 #rgb color
 r = float(fcolor.split(':')[0])
 g = float(fcolor.split(':')[1])
 b = float(fcolor.split(':')[2])
 #translate to mpl rgb format
 r = r / 255
 g = g / 255
 b = b / 255
 fcolor = (r,g,b)

This should probably be made into a library function; it would also
need the named colours (lib/gis/named_colr.c).

 if options['linecolor'] != None:   
 lcolor = options['linecolor']
 else: fcolor = 'k'

The else clause should presumably be lcolor. Does this not also need to
handle R:G:B?

-- 
Glynn Clements [EMAIL PROTECTED]
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] matplotlib example script

2008-07-24 Thread Michael Barton

Thanks for the input Glynn. A few responses below.


On Jul 24, 2008, at 12:24 PM, Glynn Clements wrote:



Michael Barton wrote:


I've attached an example script and output for using the MatPlotLib
Python library in GRASS.


One of the Apple Mail developers needs a dose of the clue-stick. The
script (and image) were attached to the HTML alternative of the
multipart/alternative section, so they don't exist for anyone using
the plain-text alternative.


Hmmm. I might be able to help that, but it's hard to remember to  
switch the way the program works between people here who expect html- 
like and plain text.






This script replicates the functionality of
d.histogram, but makes a much nicer looking plot. It also has more
options, although I've included only a very few of the formatting
possibilities here. The script sends the output from r.stats into 9
lines of code to produce the histogram with formatting options. Most
of it is done by 4 lines. This can be run from the command line and
does not require a GUI to be installed. It writes to a PNG file, but
could easily be set to create a TclTk or wxPython interactive window
as a user-selectable option.


Before we start using MatPlotLib extensively, we need to figure out
how to handle output format selection so that all script can work with
any backend, and the scripts can be used alongside d.* commands
without the caller needing to distingiush between them.

Or, at least, we need to figure out an interface that will allow us to
do that later without having to modify large numbers of scripts. IOW,
somthing like a grass.plot module which hides the details behind
begin() and end() methods, so the scripts themselves don't reference
specific format types.


I agree. One possibility is to use the ps backend if you are indeed  
planning to switch the output to postscript. In this example, I just  
wanted to show that TclTk, wxPython, Qt or other GUI is not required.





A couple of brief comments on the code:


if os.environ['PYTHONPATH'] != None:
   os.environ['PYTHONPATH'] = '/Library/Frameworks/Python.framework/ 
Versions/2.5/lib/python2.5/site-packages:'+os.environ['PYTHONPATH']


This doesn't belong here.


Forgot to take this out. This is an issue that my system is having and  
not only does it not belong in this script, it doesn't help inside  
this script.






   mapstats = grass.read_command('r.stats', input =  
options['input'], fs = ',',

   nsteps = '255', flags = 'Acn')
   histlist = mapstats.splitlines()
   for pair in histlist:


Ideally, read_command() shouldn't used for anything which might create
a lot of output.

For processing bulk output, iterate over stdout, with e.g.:

	mapstats_p = grass.start_command('r.stats', input =  
options['input'], fs = ',',
	 	  nsteps = '255', flags = 'Acn', stdout =  
subprocess.PIPE)

for pair in mapstats_p.stdout:
try:
pairlist = pair.strip().split(',')
...

mapstats_p.wait()

[Note the added strip(); the for line in file idiom doesn't strip  
the

line terminator from the line.]



The strip() is probably unnecessary; I have a habit of worrying about  
extraneous spaces causing formating problems.


As that's likely to be quite common, I have added  
grass.pipe_command(),

which is like start_command() but adds the stdout = subprocess.PIPE
automatically, so you can use:

	mapstats_p = grass.pipe_command('r.stats', input =  
options['input'], fs = ',',

  nsteps = '255', flags = 'Acn')
for pair in mapstats_p.stdout:
...

It returns the Popen object rather than the pipe itself, as you need
the Popen object for the wait() call (even if you don't want the exit
code, the child process will remain as a zombie until you call  
wait()).


Thanks.





   for i in range(cells):
   plotlist.append(val)


Ouch. This is going to cause problems (i.e. fail) for large maps (even
for moderately-sized maps, it will be slow). As it stands, this script
*isn't* a substitute for d.histogram.


Well, I was worried about that too. However, even with the 10m DEM in  
spearfish, the main time is spent in running r.stats. The rest of the  
code takes no appreciable time to run. I should try it on a big Terra/ 
ASTER or Landsat ETM file and see how long it takes.


The fastest thing probably is to just read the GRASS histogram file.  
But this doesn't permit changing bin sizes or other plotting options.





As axes.hist() just calls numpy.histogram() then uses axes.bar() (or
axes.barh()) to plot the result, you may as well just use axes.bar()
directly on the value,count pairs emitted by r.stats.

If you specifically want to bin the data, it would be better to either
add an option to r.stats, or to bin the data in the script as it is
being read, rather than read an entire raster map into a Python list.


I agree about r.stats. Wouldn't binning in 

Re: [GRASS-dev] matplotlib example script

2008-07-24 Thread Michael Barton


On Jul 24, 2008, at 6:03 PM, Glynn Clements wrote:



Michael Barton wrote:


This script replicates the functionality of
d.histogram, but makes a much nicer looking plot. It also has more
options, although I've included only a very few of the formatting
possibilities here. The script sends the output from r.stats into 9
lines of code to produce the histogram with formatting options.  
Most

of it is done by 4 lines. This can be run from the command line and
does not require a GUI to be installed. It writes to a PNG file,  
but
could easily be set to create a TclTk or wxPython interactive  
window

as a user-selectable option.


Before we start using MatPlotLib extensively, we need to figure out
how to handle output format selection so that all script can work  
with

any backend, and the scripts can be used alongside d.* commands
without the caller needing to distingiush between them.

Or, at least, we need to figure out an interface that will allow  
us to
do that later without having to modify large numbers of scripts.  
IOW,

somthing like a grass.plot module which hides the details behind
begin() and end() methods, so the scripts themselves don't reference
specific format types.


I agree. One possibility is to use the ps backend if you are indeed
planning to switch the output to postscript. In this example, I just
wanted to show that TclTk, wxPython, Qt or other GUI is not required.


Well, the general idea is to support as wide a range of backends as
possible. Essentially, we want something similar to the d.* commands,
which will output to a variety of backends, even those which are yet
to be invented.

Also, if we can't find any better way to integrate d.* commands and
matplotlib, we may end up adding a d.graph backend to matplotlib.


  for i in range(cells):
  plotlist.append(val)


Ouch. This is going to cause problems (i.e. fail) for large maps  
(even
for moderately-sized maps, it will be slow). As it stands, this  
script

*isn't* a substitute for d.histogram.


Well, I was worried about that too. However, even with the 10m DEM in
spearfish, the main time is spent in running r.stats. The rest of the
code takes no appreciable time to run. I should try it on a big  
Terra/

ASTER or Landsat ETM file and see how long it takes.


Here:

$ g.region rast=elevation.10m
$ time r.stats -Acn elevation.10m /dev/null
r.stats complete.

real0m1.233s
user0m1.220s
sys 0m0.013s
time stuff/histogram_mpldemo.py input=elevation.10m
r.stats complete.

real0m5.058s
user0m4.886s
sys 0m0.163s

Note that the time taken to get to the r.stats complete point
includes the above loop to insert cells copies of val in
plotlist.

Also, r.stats alone is only using one core, while the script will be
using both (one running the script, another running r.stats
concurrently). With a single core (or if the other core(s) are busy),
it would be worse.


As axes.hist() just calls numpy.histogram() then uses axes.bar() (or
axes.barh()) to plot the result, you may as well just use axes.bar()
directly on the value,count pairs emitted by r.stats.

If you specifically want to bin the data, it would be better to  
either

add an option to r.stats, or to bin the data in the script as it is
being read, rather than read an entire raster map into a Python  
list.


I agree about r.stats. Wouldn't binning in the script take as long as
the current method?


Even if it takes just as long, you're less likely to have it fail
because plotlist consumed all available RAM. As it stands, plotlist
will have one entry for every non-null cell in the raster.

When processing bulk data, anything that uses a fixed amount of
memory (e.g. one integer per bin) is preferable to using memory
proportional to the size of the input.

Hence the recommendation to iterate over the lines of r.stats' output
rather than read it all into a list then iterate over the list.


I'll give that a try.





  if len(fcolor.split(':')) == 3:
  #rgb color
  r = float(fcolor.split(':')[0])
  g = float(fcolor.split(':')[1])
  b = float(fcolor.split(':')[2])
  #translate to mpl rgb format
  r = r / 255
  g = g / 255
  b = b / 255
  fcolor = (r,g,b)


This should probably be made into a library function; it would also
need the named colours (lib/gis/named_colr.c).


Yes. And it might be one in MatPlotLib. It has some color function,
but I haven't explored them yet. Do the grass color names have html
equivalents? If so, we don't need /lib/gis/named_colr.c.


I suspect that we want our own code here. Eventually there may be
Python scripts which don't use matplotlib (i.e. non-d.* scripts) but
which still need to parse colours.

I've added this to grass.py:

def parse_color(val, dflt = None):
if val in named_colors:
return named_colors[val]

vals = val.split(':')
if len(vals) == 3:

Re: [GRASS-dev] matplotlib example script

2008-07-24 Thread Michael Barton


On Jul 24, 2008, at 6:03 PM, Glynn Clements wrote:


Even if it takes just as long, you're less likely to have it fail
because plotlist consumed all available RAM. As it stands, plotlist
will have one entry for every non-null cell in the raster.

When processing bulk data, anything that uses a fixed amount of
memory (e.g. one integer per bin) is preferable to using memory
proportional to the size of the input.

Hence the recommendation to iterate over the lines of r.stats' output
rather than read it all into a list then iterate over the list.


To do a histogram, I need to send ax.hist a list of values. So I don't  
know how I can get away without creating that list unless I use a  
completely different algorithm (something from numpy?).


This probably isn't a very good example, as there may be other more  
efficient ways to get the data to actually create the histogram. It's  
more a way to try this out and see if it is useful--and what needs to  
be done to use it.


On the other hand, in spite of recent improvements, d.hist is still  
pretty ugly, with formatting issues like varying font sizes on a  
single axis. And it is not very flexible. It would be nice to see  
where standard deviations lie, customize axis formatting, etc. For  
this module in particular, it is probably better to bin the data in  
another way than I have, and input it into one of the matplotlib plot  
methods.


Here are the results of trying this on a larger image (Terra/ASTER)  
with almost 30 million cells. The script takes about 2.5X as long as  
d.histogram (which surprising to me is also using r.stats internally).  
It still isn't bad at 15 seconds for an unoptimized first shot.



GRASS 7.0.svn (Spain_wgs84z30):~  r.info ASTER.0536.vnir.3   
+ 
+
 | Layer:ASTER.0536.vnir.3  Date: Sun Aug 17 23:08:30  
2003|
 | Mapset:   aster  Login of Creator:  
cmbarton|
 | Location:  
Spain_wgs84z30   |
 | DataBase: /Users/Shared/ 
grassdata  |
 | Title: ( ASTER.vnir. 
0536.3 )   |
 | Timestamp:  
none|
  
| 
|
  
| 
|
 |   Type of Map:  raster   Number of Categories:  
252 |
 |   Data Type: 
CELL   |
 |   Rows:  
5175   |
 |   Columns:   
5787   |
 |   Total Cells:   
29947725   |
 |Projection: UTM (zone  
30)   |
 |N:  4423719.5S:  4346094.5   Res: 
15 |
 |E:   774409.5W:   687604.5   Res: 
15 |
 |   Range of data:min = 5  max =  
252 |
  
| 
|
 |   Data  
Description:|
 |generated by  
i.in.erdas |
  
| 
|
  
| 
|
  
+ 
+



GRASS 7.0.svn (Spain_wgs84z30):~  time r.stats -Acn ASTER.0536.vnir. 
3  /dev/null

 100%
r.stats complete.

real0m2.774s
user0m2.418s
sys 0m0.122s
GRASS 7.0.svn (Spain_wgs84z30):~  time histogram_mpldemo.py  
input=ASTER.0536.vnir.3 output=testaster

 100%
r.stats complete.

real0m14.957s
user0m11.938s
sys 0m1.443s

GRASS 7.0.svn (Spain_wgs84z30):~  time d.histogram ASTER.0536.vnir.3  
/dev/null 100%

r.stats complete.

real0m5.811s
user0m2.440s
sys 0m0.144s

Michael

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev