(the reply may look screwy because I am temporarily saddled with a hobbled gmail interface on IE something or the other, and it seems to have a mind of its own)
I unpacked the shapefiles via a module on CPAN and stored the coordinates as text strings (well, everything is a string for SQLite) readymade for a p-in-p algorithm that I adapted from the Wolf book (Algorithms with Perl or some such title -- Hietanemi et al). IIRC, I stored all the x-coords as a string in one col, all the y-coords as a string in another col, and that allowed me to reconstruct the arrays I needed to do the p-in-p. Iterating over all the geoms was decidedly silly -- 250k * 7.5m results in more than a trillion transactions, if my math is correct. Instead, I stored the bounding box of each shape in separate cols, and as a first pass, use SQL selects to narrow down the points per poly. Then ran the p-in-p on each set. Re. advantages -- well, I first tried working with ArcGIS/SDE. An existing solution was taking about 8 days from start to finish (a lot of pre and post work on either side of the overlay), and that is if the process didn't explode prematurely. My process took about 2.5 to 3 days. I do recall building the SQLite db with a bigger page size (I am writing all this from memory. I am very far from my computer right now). I have briefly looked at the Spatialite solution and that does look like a very cool start. In fact, it gives me enough incentive to try and learn C. I do have one question for you all -- why on earth does Spatialite (and also PostGIS as well as ArcGIS) store the geometry as a BLOB? What are the advantages? Is it space? Is it speed? Both? I can't really understand the reason for all that AsWKT and WKB and AsPoints and whatever back-and-forth. Why not just store coords as numbers, or a string of coords as, well, a string of coords? Especially in the case of PostGIS, PostgreSQL has a col type of arrays, so why not just use that? SQLite stores everything as strings anyway, so why bother with the back and forth unless there is speed advantage. I don't care about the space -- hard disk is cheap. Many thanks for this conversation. This is very interesting for me. I would love to see a truly lightweight but truly real geospatial db solution realized, and I am a committed fan of SQLite. On 4/8/08, Martin Davis <[EMAIL PROTECTED]> wrote: > > > P Kishor wrote: > > > > I actually wrote a pretty useful point-in-polygon routine using Perl > > DBD::SQLite unwrapping Shapefiles into a SQLite db and then using > > SQLite for boundary matching. It was for a very large p-in-p (7.5 > > million points against 250k polys) that ArcGIS was choking over. Works > > very sweet in Perl/SQLite in about 20% to 30% of the time taken. > > > > > > > I'm curious about your SQLite spatial solution. > > What approach did you use for storing geometry? How did you load the > shapefiles? > > Since SQLite has no spatial index, did you just iterate over all > geometries for you PIP test? What was your PIP test written in? > > What was the advantage of doing this over just using the raw shapefiles > - was it performance, or ease of data management? > > The SQLite Spatial direction seems like an interesting one to pursue for > just this kind of bulk processing use case. > > > > > > > > -- > Martin Davis > Senior Technical Architect > Refractions Research, Inc. > (250) 383-3022 > > -- Puneet Kishor http://punkish.eidesis.org/ Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/ Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/ _______________________________________________ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users