Chagrined by a performance hit apparently involving zlib compression, I patched my local GRASS 7.0 to accept an environment variable that disables raster compression. At least for the particular DCELL rasters I've been using, this yields a ~5x improvement in run time during write operations, at a cost of some extra disk usage that I'm often more than happy to incur. See sample timing outputs below my sig.

Admittedly, the speedup factor drops to ~2.5x if the timing comparisons include a forced sync to disk, because uncompressed output means more IO. But that's still a nice speedup, and the disk IO cost may be of little consequence in cases where the raster can fit comfortably in the OS page cache and is an intermediate output that gets read back in during a subsequent step of a particular processing workflow (and perhaps then removed before ever being flushed to disk).

My demo-purposes patch is attached. It just adds a GRASS_NO_COMPRESSION environment variable and then injects a new conditional dispatch into each of the three Rast_open{_,_fp_,_c_}new functions. For cleaner semantics, it might be better to keep the original functions but rename them as *_compressed (paralleling the existing *_uncompressed versions) for callers who really want/need to force compression (e.g., r.compress, which my patch in some sense "breaks" when the environment variable is set), but I didn't do this here. And I haven't looked hard to see if other modules/etc truly depend on the existing compression behavior.

Any chance something like this could make it into trunk?

As a real world example, I recently wrote a Python module that relies on r.mapcalc, r.neighbors, and r.samp.stats. With GRASS_NO_COMPRESSION set, total runtime dropped from 20 minutes to 10 minutes on a 12K by 12K input raster, with a disk usage differential that peaked at ~4GB during processing. Outputs were identical other than compression.

Cheers,
Jim

------------------------------
James Regetz, Ph.D.
Scientific Programmer/Analyst
National Center for Ecological Analysis & Synthesis
735 State St, Suite 300
Santa Barbara, CA 93101


# timings performed on Ubuntu 10.04 with ample RAM and a recent
# build of GRASS 7.0-svn with the applied patch

# describe the 'test' raster used below; based on some 90m SRTM
# data coerced to double precision
GRASS 7.0.svn (tmp):~ > r.info -g test
...
rows=4801
cols=4801
cells=23049601
datatype=DCELL

GRASS 7.0.svn (tmp):~ > r.univar test
total null and non-null cells: 23049601
total null cells: 0

Of the non-null cells:
----------------------
n: 23049601
minimum: 500
maximum: 3139
range: 2639
mean: 1445.04
mean of absolute values: 1445.04
standard deviation: 336.437
...


# using (default) zlib compression on write
GRASS 7.0.svn (tmp):~ > g.gisenv set="OVERWRITE=1"
GRASS 7.0.svn (tmp):~ > g.region rast=test
GRASS 7.0.svn (tmp):~ > unset GRASS_NO_COMPRESSION
GRASS 7.0.svn (tmp):~ > sync; echo 3 > /proc/sys/vm/drop_caches
GRASS 7.0.svn (tmp):~ > time r.mapcalc "foo = test" --quiet

real    0m13.209s
user    0m12.660s
sys     0m0.400s


# after disabling compression on write
GRASS 7.0.svn (tmp):~ > g.gisenv set="OVERWRITE=1"
GRASS 7.0.svn (tmp):~ > g.region rast=test
GRASS 7.0.svn (tmp):~ > export GRASS_NO_COMPRESSION=1
GRASS 7.0.svn (tmp):~ > sync; echo 3 > /proc/sys/vm/drop_caches
GRASS 7.0.svn (tmp):~ > time r.mapcalc "foo = test" --quiet

real    0m2.514s
user    0m2.320s
sys     0m0.170s

Index: lib/raster/open.c
===================================================================
--- lib/raster/open.c	(revision 51380)
+++ lib/raster/open.c	(working copy)
@@ -376,7 +376,10 @@
  */
 int Rast_open_c_new(const char *name)
 {
-    return open_raster_new(name, OPEN_NEW_COMPRESSED, CELL_TYPE);
+	if (R__.use_compression)
+        return open_raster_new(name, OPEN_NEW_COMPRESSED, CELL_TYPE);
+    else
+        return open_raster_new(name, OPEN_NEW_UNCOMPRESSED, CELL_TYPE);
 }
 
 /*!
@@ -465,7 +468,10 @@
  */
 int Rast_open_fp_new(const char *name)
 {
-    return open_raster_new(name, OPEN_NEW_COMPRESSED, R__.fp_type);
+	if (R__.use_compression)
+        return open_raster_new(name, OPEN_NEW_COMPRESSED, R__.fp_type);
+    else
+        return open_raster_new(name, OPEN_NEW_UNCOMPRESSED, R__.fp_type);
 }
 
 /*!
@@ -890,7 +896,10 @@
  */
 int Rast_open_new(const char *name, RASTER_MAP_TYPE wr_type)
 {
-    return open_raster_new(name, OPEN_NEW_COMPRESSED, wr_type);
+	if (R__.use_compression)
+        return open_raster_new(name, OPEN_NEW_COMPRESSED, wr_type);
+    else
+        return open_raster_new(name, OPEN_NEW_UNCOMPRESSED, wr_type);
 }
 
 /*!
Index: lib/raster/init.c
===================================================================
--- lib/raster/init.c	(revision 51380)
+++ lib/raster/init.c	(working copy)
@@ -92,6 +92,7 @@
 
     R__.nbytes = sizeof(CELL);
     R__.compression_type = getenv("GRASS_INT_ZLIB") ? 2 : 1;
+    R__.use_compression = getenv("GRASS_NO_COMPRESSION") ? 0 : 1;
 
     G_add_error_handler(Rast__error_handler, NULL);
 
Index: lib/raster/R.h
===================================================================
--- lib/raster/R.h	(revision 51380)
+++ lib/raster/R.h	(working copy)
@@ -79,6 +79,7 @@
     int want_histogram;
     int nbytes;
     int compression_type;
+    int use_compression;
     int window_set;		/* Flag: window set?                    */
     int split_window;           /* Separate windows for input and output */
     struct Cell_head rd_window;	/* Window used for input        */

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Reply via email to