On Nov 14, 2007 9:48 AM, Martin Davis <[EMAIL PROTECTED]> wrote: > Lee, > > Having done some more investigation, I can say that your dataset is a > poster child for the use of Cascaded Union. This is because it contains > relatively simple geometries with a very high degree of overlap. > Cascaded Union has the effect of quickly merging and discarding linework > which doesn't appear in the final result. This makes each individual > union operation fairly performant. > > In JTS-land my results were: > > Cascaded Union: 20 sec > Iterated Union: 3 h 40 min ! (This is equivalent to the PostGIS > geomunion aggregate) > > I realize this doesn't help directly with your work, but it is a good > example of why it would be nice to have CascadedUnion functionality in > PostGIS. > > I can also see why buffer was struggling with this data. It has to deal > with all the linework in the dataset at once, and with that many > overlapping lines it is simply overwhelmed. Using buffer() is a hack > to get around the lack of true CascadedUnion. It works in many cases, > but does have its limits. > > Martin > *Apologies for the cross-posting*
This is highly relevant to some of the work that I do on a regular basis with soil survey data. Dealing with thousands of polygons which share a common attribute is usually a nice time to GeomUnion them together. I have only been successful at working with rather small geometry collections, and something like CascadedUnion sounds fantastic. What would it take to get this implemented in PostGIS ? It might also be worth having in GRASS as well. Cheers, Dylan > > Martin Davis wrote: > > As I mentioned in my previous post, buffer has to pull all geometries > > into memory at once and process them all together. It creates a lot > > of internal data structures in the course of processing. I'm not that > > suprised that it doesn't work on 30K geometries. Or, there may well be > > a memory leak - our test cases don't actually include one with 30K > > geoms 8^) (Lee, I'd be interested to see if this works in JTS - can > > you send me your dataset as a shapefile?) > > > > I can think of a couple of things to do: > > - Get the new version of JTS (from me) and use the CascadedUnion class > > - Use a simple grid index on your data to partition it, union each > > partition separately, and then union the results. This process would > > look something like: - pick an appropriate grid size (say 10x10 ?) > > - compute the interior point of each geometry, determine which grid > > cell it lies in, and save this in a new column > > - union all geoms in each grid cell together > > - union the result together > > > > Note: I haven't actually implemented this approach, but it's the only > > way I can see to reduce the amount of geometry you are working with at > > any given step. > > > > Lee Keel wrote: > >>> -----Original Message----- > >>> From: [EMAIL PROTECTED] > >>> [mailto:postgis-users- > >>> [EMAIL PROTECTED] On Behalf Of Kevin Neufeld > >>> Sent: Tuesday, November 13, 2007 12:40 AM > >>> To: PostGIS Users Discussion > >>> Subject: Re: [postgis-users] geomunion revisited.... > >>> > >>> Also, I think you may be after the SQL syntax UNION ALL, not UNION, if > >>> you are simply after concatenating the resultset from table 2 to > >>> table 1. > >>> > >>> select simplify(... > >>> UNION ALL > >>> select simplify(... > >>> > >>> Also, I too would be curious how buffer(collect(...), 0) fares for you > >>> instead of geomunion() for your final 5 hour query. > >>> Since you're doing this all at once anyway, it may work. > >>> Geomunion() is > >>> an aggregate for the two geometry method. Which means that it will > >>> slowly add one geometry at a time to an every growing resulting > >>> geometry. Buffer(collect(...), 0) will decontruct the collection once, > >>> and rebuild it once to a union-ed geometry. > >>> -- Kevin > >>> > >>> Josh Livni wrote: > >>> > >>>> I think running buffer(bunch_of_geoms,0) rather than > >>>> geomunion(bunch_of_geoms) might be faster. > >>>> > >>>> That said, still seems it takes a bit long. If it were me I'd try > >>>> playing around by first creating a table with your results, eg: > >>>> > >>>> '''create table simple_buffers as ( > >>>> select simplify(buffer(simplify(the_geom, 5), 400), 150) as the_geom > >>>> from table1 > >>>> union > >>>> select simplify(buffer(simplify(the_geom, 5), 400), 150) as the_geom > >>>> from table2); ''' > >>>> > >>>> then you could create a GiST index on the_geom of your new table, run > >>>> vacuum analyze on it, and then try the geomunion and buffer to test. > >>>> With 32k+ rows, depending on how much they overlap, that might help a > >>>> little, or at least help potentially isolate the issue if it does take > >>>> as long. > >>>> Also I assume you've modified your postgres config away from the > >>>> defaults to take advantage of more memory on your machine, etc. > >>>> > >>>> > >>>> > >>>> -Josh > >>>> > >> [Lee Keel] > >> Well, I have played with this some more and here is what I have found. > >> > >> I tried using the buffer(collect(the_geom), 0) on the largest table > >> (about > >> 30K rows) and I get the following error: > >> > >> NOTICE: St9bad_alloc > >> > >> ERROR: GEOS buffer() threw an error! > >> SQL state: XX000 > >> > >> > >> This entire table is only 5872KB but when I monitor this buffer > >> process the > >> memory gets over 1.7GB before it finally crashes out. Even if you > >> were to > >> triple the memory I can't see where it would ever get to 1.7GB. > >> NOTE: When > >> monitoring the process of doing just the collect(the_geom) the memory > >> footprint gets up to about 26-28MB and takes about 2.5 minutes. This > >> leads > >> me to think there might be a memory leak in the buffer(). > >> > >> > >> I tried using a test table. This new table is 3704KB. But I > >> realized that > >> a gist index is not going to buy me anything because it is still > >> going to > >> pull the entire table. > >> > >> > >> I have listed my config settings below. I am not sure how much more > >> I can > >> tweak them since I am running on 64bit Vista. I have had a few posts > >> in the > >> last week that have basically said that the windows version of this > >> can't be > >> tweaked much more due to constraints by windows. > >> > >> shared_buffers = 256MB > >> temp_buffers = 32MB > >> max_prepared_transactions = 100 > >> work_mem = 16MB > >> maintenance_work_mem = 256MB > >> max_stack_depth = 3MB > >> > >> Thanks for everyone's help and attention to this. > >> Lee > >> > >> This email and any files transmitted with it are confidential and > >> intended solely for the use of the individual or entity to whom they > >> are addressed. If you have received this email in error please notify > >> the sender. This message contains confidential information and is > >> intended only for the individual named. If you are not the named > >> addressee you should not disseminate, distribute or copy this e-mail. > >> _______________________________________________ > >> postgis-users mailing list > >> [EMAIL PROTECTED] > >> http://postgis.refractions.net/mailman/listinfo/postgis-users > >> > >> > > > > -- > Martin Davis > Senior Technical Architect > Refractions Research, Inc. > (250) 383-3022 > > _______________________________________________ > postgis-users mailing list > [EMAIL PROTECTED] > http://postgis.refractions.net/mailman/listinfo/postgis-users > _______________________________________________ grassuser mailing list grassuser@grass.itc.it http://grass.itc.it/mailman/listinfo/grassuser