Thanks for these informations. Martin! :-)
I agree with you. Not all kind of data will profit
from this. Large polygon layers and maybe lines too.
Some heuristics have to be applied here.
Implying some spatial coherency, a cache of say the
last 35,000 visited vertices as a basis
of comparison leads to viable results, even if
they are not optimal.
I implemented this strategy with a LinkedHashMap and
it works fine for my test data. Unfortunately this
produces a bit of temporal garbage (the cache) so
the saving is not obvious at first glance. (GC helps)
If someone wants to test it, I've attached
a patch against org.geotools.shapefile.PolygonHandler.
I had a look at the JavaDocs of PackedCoordinateSequence & Co
too. To integrate this into JUMP _really_ leads to some work.
@Paul: IIRC Larry and Michaël intern()ed the Strings
from DBF files to save a lot of memory.
Regards,
Sascha
Paul Austin schrieb:
> Another huge memory saving can be done by using String.intern() on
> string objects as they are immutable anyway. I think the latest VM's do
> some garbage collection on the intern cache so it's not a bad thing to do.
>
> Paul
>
> Martin Davis wrote:
>> I'm almost 100% sure that JUMP treats Coordinate objects as immutable
>> (at least in the core code. I do know that at least one plugin I wrote
>> changes the Coordinates in Geometries - my bad!). I think this should
>> be a firm design principle of JUMP - it's simply not worth the risk to
>> mutate Coordinates in-place. The same goes for Geometrys, I think.
>> There's lots of benefits to having immutability, and lots of risks to
>> not having it.
>>
>> So your Coordinate-sharing idea should work. Whether this really makes
>> much of an impact in the general use case I can't say - it's very
>> dependent on the nature of the data being loaded. 50% savings doesn't
>> seem like that much to me - but I guess that depends on whether you are
>> trying to load a 2 GB shapefile!
>>
>> Perhaps this should be called Coordinate-externing, referring to the
>> similar strategy that Java uses for String constants.
>>
>> Another possible option for providing memory savings is to take
>> advantage of the JTS CoordinateSequence facility, and use
>> PackedCoordinateSequences for raw Geometry storage. This might give an
>> even bigger memory savings. But it would *definitely* require changes to
>> the core, since JUMP was mostly written before the JTS CS was
>> introduced, so the code assumes it can get down-and-dirty with the
>> Coordinate arrays in JTS.
>>
>> Sascha L. Teichmann wrote:
>>
>>> Just for curiosity:
>>>
>>> When I load a larger polygon shapefile (burlulc)
>>> I recognized that the geometries share a lot of
>>> common vertices. In case of the burlulc layer
>>> over 1,500,000.
>>> They are represented by com.vividsolutions.jts.geom.Coordinate
>>> objects. If a shapefile gets loaded a new Coordinate object
>>> for each vertex is created.
>>>
>>> Now I added a simple TreeMap to the PolygonHandler of
>>> OpenJUMP's shapefile reader to reuse already created
>>> Coordinate objects and share them with other geometries.
>>>
>>> After loading the data (+ triggering GC) the normal OJ
>>> uses approx. 124MB memory. After the the shared vertices
>>> modification OJ uses only approx. 89MB.
>>>
>>> My question: May this mod lead to any side effects?
>>> With JTS? With the CursorTools?
>>>
>>> Coordinate objects are not immutable, so I expect
>>> side effects with e.g. neighboring polygons when
>>> I edit one of them.
>>>
>>> I had a brief look at the code and played with
>>> the CursorTools but I haven't found any side effects
>>> yet.
>>>
>>> This idea comes from playing with OJ on a boring
>>> friday evening. It only costs me a few seconds to
>>> implement and if you say "This idea is plain stupid!"
>>> I'll drop it immediately .. ;-)
>>>
>>> Kind regards,
>>> Sascha
>>>
>>> -------------------------------------------------------------------------
>>> This SF.net email is sponsored by DB2 Express
>>> Download DB2 Express C - the FREE version of DB2 express and take
>>> control of your XML. No limits. Just data. Click to get it now.
>>> http://sourceforge.net/powerbar/db2/
>>> _______________________________________________
>>> Jump-pilot-devel mailing list
>>> Jump-pilot-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>>>
>>>
>>>
>>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Jump-pilot-devel mailing list
> Jump-pilot-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
Index: src/org/geotools/shapefile/PolygonHandler.java
===================================================================
--- src/org/geotools/shapefile/PolygonHandler.java (Revision 863)
+++ src/org/geotools/shapefile/PolygonHandler.java (Arbeitskopie)
@@ -3,6 +3,8 @@
import java.io.IOException;
import java.lang.reflect.Array;
import java.util.ArrayList;
+import java.util.Map;
+import java.util.LinkedHashMap;
import com.vividsolutions.jts.algorithm.CGAlgorithms;
import com.vividsolutions.jts.algorithm.RobustCGAlgorithms;
@@ -16,7 +18,47 @@
public class PolygonHandler implements ShapeHandler{
protected static CGAlgorithms cga = new RobustCGAlgorithms();
int myShapeType;
-
+
+ /**
+ * Coordinate only calcs hash over x and y.
+ * Extending it to hash z too.
+ */
+ private static final class Coord extends Coordinate {
+
+ public Coord(double x, double y) {
+ super(x, y);
+ }
+
+ public boolean equals(Object o) { // equals3D()
+ Coord c = (Coord)o;
+ return x == c.x
+ && y == c.y
+ && (z == c.z || (Double.isNaN(z) &&
Double.isNaN(c.z)));
+ }
+
+ public int hashCode() {
+ //Algorithm from Effective Java by Joshua Bloch
[Jon Aquino]
+ int result = 17;
+ result = 37 * result + hashCode(x);
+ result = 37 * result + hashCode(y);
+ if (!Double.isNaN(z))
+ result = 37 * result + hashCode(z);
+ return result;
+ }
+ } // class Coord
+
+ /** This is the number of coordinates to store for comparison.
+ * If the number of vertices is very large it would be
+ * inefficent to store them all in a HashMap.
+ * Limiting does not provide the the optimal solution
+ * but if some spatial coherence is given it does
+ * a good job.
+ */
+ public static final int MAX_COORDINATE_CACHE = 35000;
+
+ /** the coordinate cache */
+ protected LinkedHashMap coordinateCache;
+
public PolygonHandler()
{
myShapeType = 5;
@@ -53,7 +95,7 @@
public Geometry read( EndianDataInputStream file , GeometryFactory
geometryFactory, int contentLength)
throws IOException, InvalidShapefileException
{
-
+
int actualReadWords = 0; //actual number of words read (word = 16bits)
// file.setLittleEndianMode(true);
@@ -87,34 +129,73 @@
partOffsets = new int[numParts];
- for(int i = 0;i<numParts;i++){
- partOffsets[i]=file.readIntLE();
- actualReadWords += 2;
+ for (int i = 0; i < numParts; i++) {
+ partOffsets[i]=file.readIntLE();
}
+ actualReadWords += (numParts << 1); // numParts
* 2
//LinearRing[] rings = new LinearRing[numParts];
ArrayList shells = new ArrayList();
ArrayList holes = new ArrayList();
- Coordinate[] coords = new Coordinate[numPoints];
-
- for(int t=0;t<numPoints;t++)
- {
- coords[t]= new Coordinate(file.readDoubleLE(),file.readDoubleLE());
- actualReadWords += 8;
- }
-
- if (myShapeType == 15)
- {
- //z
- file.readDoubleLE(); //zmin
- file.readDoubleLE(); //zmax
- actualReadWords += 8;
- for(int t=0;t<numPoints;t++)
- {
- coords[t].z = file.readDoubleLE();
- actualReadWords += 4;
- }
- }
+
+ if (coordinateCache == null) {
+ coordinateCache = new
LinkedHashMap(MAX_COORDINATE_CACHE-1) {
+ protected boolean
removeEldestEntry(Map.Entry entry) {
+ return size() >
MAX_COORDINATE_CACHE;
+ }
+ };
+ }
+
+ Coordinate [] coords = new Coordinate[numPoints];
+
+ // Coordinate is not able to hash 3D so wrap
+ // the coords in subclass Coord.
+ // This produces a lot of temporary objects so
+ // this path is separated from the simple x,y
case. :-/
+
+ if (myShapeType == 15) { // with z
+
+ for (int t = 0; t < numPoints; ++t)
+ coords[t] = new Coord(
+ file.readDoubleLE(),
+ file.readDoubleLE());
+
+ actualReadWords += (numPoints << 3); //
numPoints * 8
+
+ file.readDoubleLE(); //zmin
+ file.readDoubleLE(); //zmax
+ actualReadWords += 8;
+
+ for (int t = 0; t < numPoints; ++t)
+ coords[t].z =
file.readDoubleLE();
+
+ actualReadWords += (numPoints << 2); //
numPoints * 4
+
+ for (int t = 0; t < numPoints; ++t) {
+ Coord c = (Coord)coords[t];
+ Coordinate shared =
(Coordinate)coordinateCache.get(c);
+
+ if (shared == null)
+ coordinateCache.put(c,
shared = new Coordinate(c));
+
+ coords[t] = shared;
+ }
+ }
+ else { // without z -- directly use Coordinate
+ Coordinate coord = new Coordinate();
+ for (int t = 0; t < numPoints; ++t) {
+ coord.x = file.readDoubleLE();
+ coord.y = file.readDoubleLE();
+ Coordinate shared =
(Coordinate)coordinateCache.get(coord);
+ if (shared == null) {
+
coordinateCache.put(coord, coord);
+ shared = coord;
+ coord = new
Coordinate();
+ }
+ coords[t] = shared;
+ }
+ actualReadWords += (numPoints << 3); //
numPoints * 8
+ }
if (myShapeType >= 15)
{
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel