Thanks for these informations. Martin! :-)

I agree with you. Not all kind of data will profit
from this. Large polygon layers and maybe lines too.
Some heuristics have to be applied here.

Implying some spatial coherency, a cache of say the
last 35,000 visited vertices as a basis
of comparison leads to viable results, even if
they are not optimal.

I implemented this strategy with a LinkedHashMap and
it works fine for my test data. Unfortunately this
produces a bit of temporal garbage (the cache) so
the saving is not obvious at first glance. (GC helps)

If someone wants to test it, I've attached
a patch against org.geotools.shapefile.PolygonHandler.

I had a look at the JavaDocs of PackedCoordinateSequence & Co
too. To integrate this into JUMP _really_ leads to some work.

@Paul: IIRC Larry and Michaël intern()ed the Strings
       from DBF files to save a lot of memory.

Regards,
   Sascha

Paul Austin schrieb:
> Another huge memory saving can be done by using String.intern() on 
> string objects as they are immutable anyway. I think the latest VM's do 
> some garbage collection on the intern cache so it's not a bad thing to do.
> 
> Paul
> 
> Martin Davis wrote:
>> I'm almost 100% sure that JUMP treats Coordinate objects as immutable 
>> (at least in the core code.  I do know that at least one plugin I wrote 
>> changes the Coordinates in Geometries - my bad!).  I think this should 
>> be a firm design principle of JUMP - it's simply not worth the risk to 
>> mutate Coordinates in-place.  The same goes for Geometrys,  I think.  
>> There's lots of benefits to having immutability, and lots of risks to 
>> not having it.
>>
>> So your Coordinate-sharing idea should work.  Whether this really makes 
>> much of an impact in the general use case I can't say - it's very 
>> dependent on the nature of the data being loaded.  50% savings doesn't 
>> seem like that much to me - but I guess that depends on whether you are 
>> trying to load a 2 GB shapefile!
>>
>> Perhaps this should be called Coordinate-externing, referring to the 
>> similar strategy that Java uses for String constants.
>>
>> Another possible option for providing memory savings is to take 
>> advantage of the JTS CoordinateSequence facility, and use 
>> PackedCoordinateSequences for raw Geometry storage.  This might give an 
>> even bigger memory savings. But it would *definitely* require changes to 
>> the core, since JUMP was mostly written before the JTS CS was 
>> introduced, so the code assumes it can get down-and-dirty with the 
>> Coordinate arrays in JTS. 
>>
>> Sascha L. Teichmann wrote:
>>   
>>> Just for curiosity:
>>>
>>> When I load a larger polygon shapefile (burlulc)
>>> I recognized that the geometries share a lot of
>>> common vertices. In case of the burlulc layer
>>> over 1,500,000.
>>> They are represented by com.vividsolutions.jts.geom.Coordinate
>>> objects. If a shapefile gets loaded a new Coordinate object
>>> for each vertex is created.
>>>
>>> Now I added a simple TreeMap to the PolygonHandler of
>>> OpenJUMP's shapefile reader to reuse already created
>>> Coordinate objects and share them with other geometries.
>>>
>>> After loading the data (+ triggering GC) the normal OJ
>>> uses approx. 124MB memory. After the the shared vertices
>>> modification OJ uses only approx. 89MB.
>>>
>>> My question: May this mod lead to any side effects?
>>> With JTS? With the CursorTools?
>>>
>>> Coordinate objects are not immutable, so I expect
>>> side effects with e.g. neighboring polygons when
>>> I edit one of them.
>>>
>>> I had a brief look at the code and played with
>>> the CursorTools but I haven't found any side effects
>>> yet.
>>>
>>> This idea comes from playing with OJ on a boring
>>> friday evening. It only costs me a few seconds to
>>> implement and if you say "This idea is plain stupid!"
>>> I'll drop it immediately .. ;-)
>>>
>>> Kind regards,
>>> Sascha
>>>
>>> -------------------------------------------------------------------------
>>> This SF.net email is sponsored by DB2 Express
>>> Download DB2 Express C - the FREE version of DB2 express and take
>>> control of your XML. No limits. Just data. Click to get it now.
>>> http://sourceforge.net/powerbar/db2/
>>> _______________________________________________
>>> Jump-pilot-devel mailing list
>>> Jump-pilot-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>>>
>>>   
>>>     
>>   
> 
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Jump-pilot-devel mailing list
> Jump-pilot-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
Index: src/org/geotools/shapefile/PolygonHandler.java
===================================================================
--- src/org/geotools/shapefile/PolygonHandler.java      (Revision 863)
+++ src/org/geotools/shapefile/PolygonHandler.java      (Arbeitskopie)
@@ -3,6 +3,8 @@
 import java.io.IOException;
 import java.lang.reflect.Array;
 import java.util.ArrayList;
+import java.util.Map;
+import java.util.LinkedHashMap;
 
 import com.vividsolutions.jts.algorithm.CGAlgorithms;
 import com.vividsolutions.jts.algorithm.RobustCGAlgorithms;
@@ -16,7 +18,47 @@
 public class PolygonHandler implements ShapeHandler{
     protected static CGAlgorithms cga = new RobustCGAlgorithms();
     int myShapeType;
-    
+
+               /**
+                * Coordinate only calcs hash over x and y.
+                * Extending it to hash z too.
+                */
+               private static final class Coord extends Coordinate {
+
+                       public Coord(double x, double y) {
+                               super(x, y);
+                       }
+
+                       public boolean equals(Object o) { // equals3D()
+                               Coord c = (Coord)o;
+                               return x == c.x 
+                                       &&   y == c.y
+                                       &&  (z == c.z || (Double.isNaN(z) && 
Double.isNaN(c.z)));
+                       }
+
+                       public int hashCode() {
+                               //Algorithm from Effective Java by Joshua Bloch 
[Jon Aquino]
+                               int result = 17;
+                               result = 37 * result + hashCode(x);
+                               result = 37 * result + hashCode(y);
+                               if (!Double.isNaN(z))
+                                       result = 37 * result + hashCode(z);
+                               return result;                          
+                       }
+               } // class Coord
+
+               /** This is the number of coordinates to store for comparison.
+                *  If the number of vertices is very large it would be
+                *  inefficent to store them all in a HashMap.
+                *  Limiting does not provide the the optimal solution
+                *  but if some spatial coherence is given it does 
+                *  a good job.
+                */
+               public static final int MAX_COORDINATE_CACHE = 35000;
+
+               /** the coordinate cache */
+               protected LinkedHashMap coordinateCache;
+
     public PolygonHandler()
     {
         myShapeType = 5;
@@ -53,7 +95,7 @@
     public Geometry read( EndianDataInputStream file , GeometryFactory 
geometryFactory, int contentLength)
     throws IOException, InvalidShapefileException
     {
-    
+
        int actualReadWords = 0; //actual number of words read (word = 16bits)
         
        // file.setLittleEndianMode(true);
@@ -87,34 +129,73 @@
         
         partOffsets = new int[numParts];
         
-        for(int i = 0;i<numParts;i++){
-            partOffsets[i]=file.readIntLE();
-                       actualReadWords += 2;
+        for (int i = 0; i < numParts; i++) {
+                                       partOffsets[i]=file.readIntLE();
         }
+                               actualReadWords += (numParts << 1); // numParts 
* 2
         
         //LinearRing[] rings = new LinearRing[numParts];
         ArrayList shells = new ArrayList();
         ArrayList holes = new ArrayList();
-        Coordinate[] coords = new Coordinate[numPoints];
-        
-        for(int t=0;t<numPoints;t++)
-        {
-            coords[t]= new Coordinate(file.readDoubleLE(),file.readDoubleLE());
-                       actualReadWords += 8;
-        }
-        
-        if (myShapeType == 15)
-        {
-                //z
-            file.readDoubleLE();  //zmin
-            file.readDoubleLE();  //zmax
-                       actualReadWords += 8;
-             for(int t=0;t<numPoints;t++)
-            {
-                coords[t].z = file.readDoubleLE();
-                               actualReadWords += 4;
-            }
-        }
+
+                               if (coordinateCache == null) {
+                                       coordinateCache  = new 
LinkedHashMap(MAX_COORDINATE_CACHE-1) {
+                                               protected boolean 
removeEldestEntry(Map.Entry entry) {
+                                                       return size() > 
MAX_COORDINATE_CACHE;
+                                               }
+                                       };
+                               }
+
+        Coordinate [] coords = new Coordinate[numPoints];
+
+                               // Coordinate is not able to hash 3D so wrap
+                               // the coords in subclass Coord.
+                               // This produces a lot of temporary objects so 
+                               // this path is separated from the simple x,y 
case. :-/
+
+                               if (myShapeType == 15) { // with z
+
+                                       for (int t = 0; t < numPoints; ++t)
+                                               coords[t] = new Coord(
+                                                       file.readDoubleLE(),
+                                                       file.readDoubleLE());
+
+                                       actualReadWords += (numPoints << 3); // 
numPoints * 8
+                                       
+                                       file.readDoubleLE();  //zmin
+                                       file.readDoubleLE();  //zmax
+                                       actualReadWords += 8;
+
+                                       for (int t = 0; t < numPoints; ++t)
+                                               coords[t].z = 
file.readDoubleLE();
+
+                                       actualReadWords += (numPoints << 2); // 
numPoints * 4
+
+                                       for (int t = 0; t < numPoints; ++t) {
+                                               Coord c = (Coord)coords[t];
+                                               Coordinate shared = 
(Coordinate)coordinateCache.get(c);
+
+                                               if (shared == null)
+                                                       coordinateCache.put(c, 
shared = new Coordinate(c));
+
+                                               coords[t] = shared;
+                                       }
+                               }
+                               else { // without z -- directly use Coordinate
+                                       Coordinate coord = new Coordinate();
+                                       for (int t = 0; t < numPoints; ++t) {
+                                               coord.x = file.readDoubleLE();
+                                               coord.y = file.readDoubleLE();
+                                               Coordinate shared = 
(Coordinate)coordinateCache.get(coord);
+                                               if (shared == null) {
+                                                       
coordinateCache.put(coord, coord);
+                                                       shared = coord;
+                                                       coord = new 
Coordinate();
+                                               }
+                                               coords[t] = shared;
+                                       }
+                                       actualReadWords += (numPoints << 3); // 
numPoints * 8
+                               }
       
         if (myShapeType >= 15)
         {
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel

Reply via email to