Hi, This turned to a friendly game at Sunday afternoon. One bug from OGR side is already hunted. >From Even:
"Ok, this is interesting for several reasons : 1) Trying to run the 'ogrinfo -sql "select max(OGR_GEOM_AREA) from tpi_1" tpi_1.shp' showed that a badly written optimization in OGR 1.9.0 broke such a query. Now fixed per http://trac.osgeo.org/gdal/ticket/4633 2) Width GDAL trunk, 1.9, 1.8 and 1.7, I never get Martin's result. I always get : MAX_OGR_GEOM_AREA (Real) = 70726589354.8597 Reached on shape index = 289 (first shape is index = 0): OGRFeature(tpi_1):289 id (Real) = 904743.0000000000000000000000000000000 gridcode (Real) = 4.0000000000000000000000000000000 class_name (String) = Plains or Open Slopes I though it could come from a compiler issue, but I get the same results with GCC 4.4.3 in debug and optimized mode, and also with MSVC 2008 3) This runs in about ~ 5 seconds on all those versions on my machine (after several runs so that the I/O cache is hot) OS: Ubuntu 10.04 64bit CPU : Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz RAM : 4 GB Best regards, Even" -Jukka- ________________________________________ Lähettäjä: Michaël Michaud [[email protected]] Lähetetty: 22. huhtikuuta 2012 14:43 Vastaanottaja: [email protected] Aihe: Re: [JPP-Devel] Performance issue with large polygons with holes in ShapefileReader Hi, > As long as there aren't too many shells, it's ok to use a linear search, > isn't it? Must we do this assertion ? I started a test with your dataset as a single multipolygon. Had to stop it after 1h30... :-( > Did you confirm that OGR is actually reading and forming the > geometries? I still find it hard to believe that it can read the geoms > in only 4 s! And yet, JTS can do too. If I use the entire PolygonHandler procedure but replace the geometry by an empty geometry at the last minute (when Shapefile class feed the geometry collection) to avoid envelope calculation, the read takes about 5 s ! (instead of 16 for a full reading with envelope calculation on my machine). Seems consistent with what you get comparing jeql and ogr. > It's interesting about the envelope computation. In JTS these are > computed lazily, but in the ShapefileReader they're used right away to > test for candidate shells for holes. In your dataset (only simple polygon), your tip #3 completely avoid tests on envelope. So there should be no overhead while reading. Envelope are only computed when the FeatureDataset is built. So reading is very fast, but at one point, envelope calculation is necessary and it takes more time than reading. Now the problem is how to efficiently read multipolygons, with the overhead of envelope calculation, envelope contain test, and PiP test. Not sure that finding holes with only one shell candidate (tip #2) will help much as in complex cases (like your landuse dataset), most holes will have several shell candidates. With your file saved as a single multipolygon, even without any test ( just getEnvelopeIntenal() and getCoordinates let in the inner loop), the computation has no end (ok 250000 holes x 750000 shells > 180 billions of method calls which is something). Maybe for such cases, indexing shells with a strtree would save time, but I'm not sure it would be very efficient because of the many large areas in the dataset. Michaël > > On 4/21/2012 10:42 AM, Michaël Michaud wrote: >> Hi, >> >> Oh yes, the short circuit will not improve performance much in our case >> because most often, the equalsExact will also be performed. >> (I don't think equalsExact should be removed as it could break >> algorithms relying on it) >> >> Maybe the good structure would have been to map shells to hole lists >> with a IdentityHashMap ? >> >> I did some profiling to try to understand the difference between our >> reader and OGR's >> and found that most of the time in PolygonHandler was spent in Envelope >> computation >> >> I think envelope are required to display data efficiently, >> but I could not find where exactly they were computed >> (tried to deactivate envelope computation everywhere in FeatureDataset >> without performance gain). >> >> I also found that we missed a test for the single-ring case (where ccw >> computation >> can be avoided), but there is not much difference at the end. >> >> Michaël >>> Oh - except that in the cases where the geometries are *not* identical, >>> then a slower equals test is performed. So still better to just test >>> reference equality, I think. >>> >>> On 4/21/2012 8:05 AM, Martin Davis wrote: >>>> That does seem like a good idea. I'll look at adding that. >>>> >>>> On 4/21/2012 12:48 AM, Michaël Michaud wrote: >>>>> Implementing #3 I was wondering why not implementing the following >>>>> short circuit in Geometry equals method >>>>> public boolean equals(Object o) { >>>>> if (this == o) return true; >>>>> etc. >>>>> } >>>>> Can you see any drawback ? >>>>> Seems that this way, many algorithms could automatically benefit your >>>>> #3 trick. >>>>> >>>>> >>>> ------------------------------------------------------------------------------ >>>> For Developers, A Lot Can Happen In A Second. >>>> Boundary is the first to Know...and Tell You. >>>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >>>> http://p.sf.net/sfu/Boundary-d2dvs2 >>>> _______________________________________________ >>>> Jump-pilot-devel mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel >>>> >>>> >>>> ----- >>>> No virus found in this message. >>>> Checked by AVG - www.avg.com >>>> Version: 2012.0.1913 / Virus Database: 2411/4950 - Release Date: 04/21/12 >>>> >>>> >>>> >>> ------------------------------------------------------------------------------ >>> For Developers, A Lot Can Happen In A Second. >>> Boundary is the first to Know...and Tell You. >>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >>> http://p.sf.net/sfu/Boundary-d2dvs2 >>> _______________________________________________ >>> Jump-pilot-devel mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel >>> >>> >> ------------------------------------------------------------------------------ >> For Developers, A Lot Can Happen In A Second. >> Boundary is the first to Know...and Tell You. >> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >> http://p.sf.net/sfu/Boundary-d2dvs2 >> _______________________________________________ >> Jump-pilot-devel mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 2012.0.1913 / Virus Database: 2411/4950 - Release Date: 04/21/12 >> >> >> > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Jump-pilot-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel > > ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Jump-pilot-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Jump-pilot-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
