Re: [postgis-users] High Concurrency R* in GiST?

2011-12-06 Thread Howard Butler
libspatialindex does have thread safety and support for using custom index data 
storage.  One of the Python Rtree users has implemented ZODB storage of 
libspatialindex data, for example, so the blocks should be in place to 
theoretically support this.

Howard

On Dec 6, 2011, at 11:46 AM, C. Mundi wrote:

> Thanks, Howard.  That actually may help.  We are looking at possibly 
> borrowing some talent from another group who is experienced in db internals 
> and may be able to piece something together.  If that happens I will advocate 
> giving it back to the community, whatever "it" becomes.
> 
> Thanks!
> Carlos
> 
> On Dec 6, 2011 10:20 AM, "Howard Butler"  wrote:
> http://libspatialindex.github.com/ is a C/C++ implementation of a R* with 
> quadratic splitting that supports bulk loading. It doesn't integrate with 
> PostGIS, of course, but you may find it useful.
> 
> Additionally, http://toblerity.github.com/rtree/ is available to allow you to 
> play with it in Python for rapid prototyping.
> 
> Hope this helps,
> 
> Howard
> 
> On Dec 6, 2011, at 10:58 AM, C. Mundi wrote:
> 
> > Indeed, I am a strong believer that "worse is better" in the absence of 
> > evidence.  It just so happens that I have evidence that R* is significantly 
> > advantaged over plain R in my particular application.   So the question 
> > becomes, can I gain enough productivity elsewhere to make the impact of R 
> > v. R* moot.  I expect I can, now that the hope of free lunch has been taken 
> > away.  :)
> >
> > Thanks for the link -- it is one of my favorites.
> >
> > Carlos
> > On Dec 6, 2011 2:44 AM, "Jan Hartmann"  wrote:
> > "Worse is better" :-) See:
> >
> > http://www.jwz.org/doc/worse-is-better.html
> >
> > Cheers,
> >
> > Jan
> >
> > On 12/05/2011 10:09 PM, Paul Ramsey wrote:
> >> On Mon, Dec 5, 2011 at 11:05 AM, C. Mundi 
> >>  wrote:
> >>
> >>
> >>> I get the impression that GiST hides a lot of
> >>> implementation details.  So I am hungry for details which will help me
> >>> assess postGIS/postgreSQL for my application.
> >>>
> >> This is the key point, and it is so: the physical implementation
> >> details are hidden behind the GiST API. As a result the R-Tree
> >> implementation is a "standard" one, not an R* (though the split method
> >> in Ang/Tan not Guttman). And as a result you can't do things like
> >> rebalance the tree as specified in the R* recipe. The GiST API really
> >> is quite narrow. You have the consistent function to control reads and
> >> the compress/picksplit controlling writes.
> >>
> >> So if you're looking for optimal tree construction you've come to the
> >> wrong place. The primary benefit of the PostGIS indexing system is not
> >> it's optimal nature but its existence: it's already here, you can
> >> insert and query data with simple SQL, it does do locking and
> >> consistent operations thanks to the postgresql infrastructure wrapped
> >> around it.
> >>
> >> As an architect my recommendation would be: since the development
> >> overhead in building your system from scratch will be quite high,
> >> investing the time into a load test on PostGIS first could save you a
> >> lot of time if it turns out that even our imperfect system is actually
> >> good enough to meet your needs.
> >>
> >> Best,
> >>
> >> P.
> >> ___
> >> postgis-users mailing list
> >>
> >> postgis-users@postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
> > ___
> > postgis-users mailing list
> > postgis-users@postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
> > ___
> > postgis-users mailing list
> > postgis-users@postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] High Concurrency R* in GiST?

2011-12-06 Thread Howard Butler
http://libspatialindex.github.com/ is a C/C++ implementation of a R* with 
quadratic splitting that supports bulk loading. It doesn't integrate with 
PostGIS, of course, but you may find it useful.

Additionally, http://toblerity.github.com/rtree/ is available to allow you to 
play with it in Python for rapid prototyping.

Hope this helps,

Howard

On Dec 6, 2011, at 10:58 AM, C. Mundi wrote:

> Indeed, I am a strong believer that "worse is better" in the absence of 
> evidence.  It just so happens that I have evidence that R* is significantly 
> advantaged over plain R in my particular application.   So the question 
> becomes, can I gain enough productivity elsewhere to make the impact of R v. 
> R* moot.  I expect I can, now that the hope of free lunch has been taken 
> away.  :)
> 
> Thanks for the link -- it is one of my favorites.
> 
> Carlos
> On Dec 6, 2011 2:44 AM, "Jan Hartmann"  wrote:
> "Worse is better" :-) See:
> 
> http://www.jwz.org/doc/worse-is-better.html
> 
> Cheers,
> 
> Jan
> 
> On 12/05/2011 10:09 PM, Paul Ramsey wrote:
>> On Mon, Dec 5, 2011 at 11:05 AM, C. Mundi 
>>  wrote:
>> 
>> 
>>> I get the impression that GiST hides a lot of
>>> implementation details.  So I am hungry for details which will help me
>>> assess postGIS/postgreSQL for my application.
>>> 
>> This is the key point, and it is so: the physical implementation
>> details are hidden behind the GiST API. As a result the R-Tree
>> implementation is a "standard" one, not an R* (though the split method
>> in Ang/Tan not Guttman). And as a result you can't do things like
>> rebalance the tree as specified in the R* recipe. The GiST API really
>> is quite narrow. You have the consistent function to control reads and
>> the compress/picksplit controlling writes.
>> 
>> So if you're looking for optimal tree construction you've come to the
>> wrong place. The primary benefit of the PostGIS indexing system is not
>> it's optimal nature but its existence: it's already here, you can
>> insert and query data with simple SQL, it does do locking and
>> consistent operations thanks to the postgresql infrastructure wrapped
>> around it.
>> 
>> As an architect my recommendation would be: since the development
>> overhead in building your system from scratch will be quite high,
>> investing the time into a load test on PostGIS first could save you a
>> lot of time if it turns out that even our imperfect system is actually
>> good enough to meet your needs.
>> 
>> Best,
>> 
>> P.
>> ___
>> postgis-users mailing list
>> 
>> postgis-users@postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] geos 3.3.0 installing problem on Lion

2011-08-01 Thread Howard Butler

On Aug 1, 2011, at 3:54 PM, Mr. Puneet Kishor wrote:

>   Undefined symbols for architecture x86_64:
>   
> "__ZNSt8auto_ptrIN4geos4geom8EnvelopeEEcvSt12auto_ptr_refIT_EIS2_EEv", 
> referenced from:
>   virtual thunk to 
> geos::geom::GeometryCollection::computeEnvelopeInternal() constin 
> GeometryCollection.o
>   
> "std::auto_ptr::auto_ptr(std::auto_ptr_ref)",
>  referenced from:
>   virtual thunk to 
> geos::geom::GeometryCollection::computeEnvelopeInternal() constin 
> GeometryCollection.o
>   ld: symbol(s) not found for architecture x86_64
>   collect2: ld returned 1 exit status
>   make[2]: *** [libgeos.la] Error 1
>   make[1]: *** [all-recursive] Error 1
>   make: *** [all-recursive] Error 1
> 
> 
> h
> 
>   punkish@Lucknow ~/Projects/geos-3.3.0$sw_vers
>   ProductName:Mac OS X
>   ProductVersion: 10.7
>   BuildVersion:   11A2063
>   punkish@Lucknow ~/Projects/geos-3.3.0$arch
>   i386
> 
> What gives?


Reported earlier [1].

I spent a little time trying to fix this, but I couldn't come up with a 
solution.  

Howard

[1]: http://lists.osgeo.org/pipermail/geos-devel/2011-July/005364.html


___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] Database design for LIDAR data

2011-06-24 Thread Howard Butler

On Jun 24, 2011, at 4:04 PM, Jonathan Greenberg wrote:

> Interesting.  I came across this paper detailing the design of
> opentopography.org's lidar system, and they indicate they are doing
> something akin to load the LAS data in, and then running a spatial
> index (I'm too early in this game to know the difference between what
> they are describing and how the GIST index works):
> http://www.springerlink.com/content/x5q937840983un76/fulltext.pdf

That was their first (failed) attempt. Now I think they are storing bounding 
boxes that point to LAS files similar to many raster management systems that 
are built with postgis, et. al.  The US Army Corps of Engineers group that I am 
a part of has been driving lots of Oracle SDO_PC development, especially from 
the data management side of storing the actual point data in the database. 
We're 10x+ faster for loading data than when we started, developed a few 
algorithms to speed things up, and it has supported the ongoing development of 
libspatialindex/Rtree, PDAL, GDAL, libgeotiff, proj.4 and libLAS in the 
process. I have been pounding my head into this concrete wall for at least a 
couple of years now :)

> 
> Once I build an index for this 3-d data, setting aside the file size
> issues, should the spatial querying be relatively efficient?  

The cost of indexing all points in processing time and index storage space is 
not simply worth it when we start talking about billions to trillions of 
points.  The index just becomes some (significant) percentage of the existing 
burden that the points brought.  You most likely will never touch every point 
with windowed queries except for the case when you ask "give me all the 
points", which doesn't need an index anyway.  An index of bounds of tiles, even 
3d ones, is going to be much more efficient.  If all of your queries are for 
windows that are smaller than you blocks, just decrease their size instead of 
attempting to index them.

libLAS does have an octree index you can use to generate indexes with optional 
z-binning, and the Point Cloud Library also has an easy to use octree (no 
hookups for LAS yet though).

The most common spatial query for point cloud data is "here is my box, give me 
the points in this box".  Tiling the data, and then quickly throwing out 
candidate tiles eliminates much more cpu and data touching than having a giant 
index of 50 billion points and walking the index in some way to find candidates.

When the most common query becomes "here is my box, give me the points in this 
box that match these attributes", something else can be done to index those 
data inside the blocks.  We're simply not there yet, as most processing of 
point cloud data happens in exploitation/visualization software, and the act of 
doing windows queries already lowers their i/o costs significantly.

>From a data management perspective, I think point cloud data are best treated 
>as a specialized type of raster data.  They just get too unwieldy if you start 
>treating it as points in the vector sense.

> If so,
> how would I go about doing a cross-tile query?

In Oracle's case, you select blocks that cross your window and then go and 
unpack the point data to inspect the raw data within those blocks that cross.  
All the blocks completely contained within in your window already satisfy your 
query.

> 
> Howard, I am interested in checking out your tools but I don't have
> access to Oracle, just open source databases.  Can I use postgresql to
> utilize your algorithms?

You can setup and install Oracle for demonstration and development purposes 
without cost.  I would suggest going to oracle's site and fetching one of the 
"Developer Days" VirtualBox VMs and save yourself the hassle of trying to 
figure out how to set up the darn thing.  Hop on the PDAL list if you want to 
discuss that stuff more. We shouldn't burden this list with the minutiae of it.


Howard


___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] Database design for LIDAR data

2011-06-24 Thread Howard Butler

On Jun 24, 2011, at 1:46 PM, Jonathan Greenberg wrote:

> Folks:
> 
> This topic I believe has been brought up before, but I thought I'd
> send an email since I'm a bit of a noob with POSTGIS.  We have a large
> collection of Lidar points that I would like to perform spatial
> querying on (e.g. give me all points within a certain bounding box).
> The data (currently in LAS format, but easily loadable into the DB),
> is tiled up into smaller subsets.  The data is x,y,z,intensity (and
> some other attributes that aren't so important)  I have a few
> questions:
> 
> 1) Should I load ALL of the LAS files into one massive table for
> querying (this is going to be a LOT of points).
> 2) If not, is there a trick where if I load up each LAS file into a
> separate table (which would, in theory be preferable since I'd like to
> do some testing before dealing with a database of this size), but
> somehow when I do a spatial query, the query can span multiple tables
> (e.g. say the query box is at the intersection of two adjacent tiles)?
> 
> Related: what is the most efficient way to do a spatial query that
> effectively "rasterizes" this data, e.g. the min z value between x1
> and x2, and y1 and y2, where x2-x1 and y2-y1 are the x and y pixel
> sizes?  I'm not talking about interpolation, I'm talking an exact
> query.
> 

Jonathan,

Paul Ramsey and I have discussed what loading point cloud data into PostGIS 
would mean, and it's pretty clear it doesn't mean story each point individually 
as a geometry :)  Oracle has something called SDO_PC which is a cloud object 
which references a table of "blocks".  Each of these blocks has a geometry that 
describes the bounds of the points within that block, and the points themselves 
are stored as a packed array of dumb bytes (blob).  The user does their spatial 
querying using the bounding boxes of the blocks, rather than the individual 
points themselves, and then unpacks the block data of blocks that match the 
query only when they need to.

I have been working on libLAS (and now PDAL) to load LAS data (and other point 
cloud format types) into Oracle, and except for the part that actually uses 
psql to write the block data into the database, most of the pieces are done.  
The essential piece to make this work is a blocking algorithm that optimizes 
fill capacity to minimize the number of blocks that are required to store the 
points. While a quad tree or other spatial indexing structure could be used, 
these are often optimized for query speed to neighborhood generation, and would 
end up creating lots more tiles than necessary for storage in the blocks table. 
 libLAS has a method, lasblock http://liblas.org/utilities/lasblock.html, that 
can be used for doing this operation.  It is integrated into the PDAL library 
 too as part of a loading pipeline for loading LAS 
data into Oracle.

Another component of this is description of the schema of the point cloud data 
being loaded.  PDAL has that one taken care of for you now, and it produces an 
XML document that describes the layout and arrangement of the points in the 
points blob for Oracle SDO_PC storage.  This is generic to all point cloud data 
types, and would be easily reusable inside a PostGIS context.

That said, it could be much more advantageous to have point cloud be an actual 
type so that PostGIS can take care a lot of things for you.  Paul has a 
proposal looking for funding to do just that.  See  
http://opengeo.org/technology/postgis/coredevelopment/pointclouds/ for more 
details.

Feel free to drop by the PDAL mailing list if you want to investigate 
developing a (C++) driver to load PostGIS data 
.

Howard


___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] RTree Read

2010-09-20 Thread Howard Butler

On Sep 5, 2010, at 2:13 PM, Paul Ramsey wrote:

> Interesting paper:
> http://dke.cti.gr/pubs/confs/adbis02.pdf

It seems this is implemented in Python if you want to play around with it.

http://code.google.com/p/pyrtree/

Howard
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] lidar: what is the recommended wayof storing/indexing

2010-07-12 Thread Howard Butler
I have been working quite a bit with libLAS in coordination with Oracle Point 
Clouds (OPC) 

 to implement reading and writing data to their implementation.  There are a 
number of things that I like about it, but if something similar were to be 
implemented for PostGIS, I'd push for a few changes.  

The gist of OPC is to essentially store two tables, one with "blocks" and the 
other with a column containing the point cloud object that points to blocks 
within the block table.  LiDAR point data are typically stored as scaled 32 bit 
integers, and OPC stores these points as blobs (really!), with 64 bit integers 
for each dimension (Oracle can currently store up to twelve 64 bit dimensions 
IIRC).  The block table then contains a geometry that describes the 2d bounds 
of the points within these blobs, and any software that hopes to interact 
directly with the LiDAR points must either use the Oracle convenience functions 
or interpret the blobs themselves.  

The magic of ingesting LiDAR data into OPC is in the chipping/blocking 
algorithm.  Oracle's is a bit slow (and contained *within* the database), but 
it optimizes to ensure that the blocks are completely filled to capacity and as 
regular (squarish) in shape as possible.  I'm working on a different algorithm 
for libLAS that trades the completely-filled-to-capacity attribute for faster 
build time while doing its best to ensure squareness.  This means storing a few 
extra blocks, but at multiple times the build speed.  You can take some 
off-the-shelf spatial indexing algorithms and try to repurpose them for this 
task, but I haven't had much success getting desirable results with that 
approach. Generic spatial indexes mostly target the fast query problem, and 
we're really looking at an organization one for chipping up the data.

Given the chance to do it over again for PostGIS, I'd push for something 
similar to Paul's POINTPATCH proposal, minus the part about indexing *within* 
the patches (not really needed if you keep your patches small enough and quite 
complex if we're to start joining indexes from multiple patches together for 
complex queries).  Each patch/block would know its kD bounds, and it would 
contain a pointer to some sort of schema document that describes the dimensions 
the patch stores.  I'm currently working on implementing this type mechanism 
for libLAS with XML schemas for both OPC and LAS files 
 
.  I would also implement a POINTCLOUD 
object that is a pointer to n POINTPATCH objects, which would allow 
applications to interact with aggregates (I'm not sure what to do about the 
possible impedance between patches with regard to their dimensionality, 
however).

Howard

On Jul 8, 2010, at 11:26 AM, Paul Ramsey wrote:

> Generally (raw) LIDAR data is not on a regular grid, it's irregular.
> So it's not a raster problem it's a billions-of-points problem, and
> not just a billions-of-points problem, but a
> billions-of-hyper-dimensional-points problem (though the indexing can
> be in just 2- or 3-d, really). So grid-based solutions aren't really
> going to do it.
> 
> P.

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] convex hull

2010-06-22 Thread Howard Butler

On Jun 22, 2010, at 10:09 AM, Stephen Crawford wrote:

> All,
> 
> I'm trying to get the polygon outline of a collection of points.  I used 
> ConvexHull, and what I got was--hence the name--convex.  Is there a way to 
> get a more exact outline?  In my image below, the points are in purple and 
> the convex hull is in red.  I want to get a poly that more closely matches 
> the points.
> 
> Thanks,
> Steve
> 
> 

You want a concave hull

http://ubicomp.algoritmi.uminho.pt/local/concavehull.html

Howard

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] "Resampling" polyline shapefiles.

2009-12-25 Thread Howard Butler

ogr2ogr -segmentize

will do what you want.

Otherwise  http://yukongis.wik.is/How_To/Densify_Line has more info.

On Dec 24, 2009, at 11:41 PM, Hemant Bist  wrote:

I have a polyline shapefile with each polyline having 200 segements  
with each segments length roughly 1 meter. I want to "resample" the  
polylines to have roughly 10 meter segment size and each polyline  
having roughly 20 segments. I understand this may result in some  
loss of information.



Can postgis help me with some  conversion like this. I need this  
done somewhat quickly. I have looked at the documentation but did  
not find anything relevant.
I have installed postgis on ubuntu and ran some queries. That's my  
level of familiarity with postgis.

Would appreciate any pointers.

Best,
HB
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] Multipoint with Characteristics

2009-07-27 Thread Howard Butler


On Jul 27, 2009, at 11:35 AM, Kevin Ridge wrote:


Hello,
I am new to postgis and was wondering how to solve the following  
issue.


Lets say i have a million points each of which has a color defined.  
Is there a way to represent this using multipoint? I am currently  
using a point but each point has its own record which makes  
inserting and calculating intersections very time consuming.


Is there any efficient way of doing this instead of making a million  
rows?



/me sells Paul's services.

Not really.  You want what paul calls a "point patch" or Oracle calls  
a point cloud. Something like .


I have quite a bit of experience with Oracle's point cloud stuff as  
well (and linking libLAS to use it).  They basically store blobs (with  
indexing that points to *which* blob to fetch) and expect that  
applications other than their simple query stuff can interpret them.   
With both databases, storing point clouds in a way that they are very  
useful without a lot of application support is kind of sketchy.

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [OSGeo-Discuss] Re: [postgis-users] A bit off topic, but FOSS GIS clients...

2008-01-02 Thread Howard Butler


On Jan 2, 2008, at 12:33 PM, Andrea Aime wrote:


Despite that, user must first and foremost understand they are not
the driver.


... unless they pony up with money and/or time.  As Tim said, you are  
either a sink or a source.  As an open source developer, I invest in  
you the user (in money and/or time) by providing documentation (as  
little as possible to optimize my time), answering your questions  
directly, and coding in an effort to create more sources that provide  
me with leverage.  Everyone starts out as a sink.  The project only  
grows by producing more sources than sinks.  If you are identified as  
a sink with no hope of ever turning into a source, you will eventually  
be ignored.


If you want to have a really successful open source experience, you  
must aspire to being a source as quickly as possible.  As a source,  
you will receive differentially more investment (help, code, docs, and  
ideas) from other project principles than if your status as a source  
or sink is unclear.


Use the (and be a) source Luke!

Howard


___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] does this projection exist?

2007-10-14 Thread Howard Butler

Rob,

Dig around on http://spatialreference.org and see if you might find  
something.


Howard

On Oct 14, 2007, at 11:47 PM, Rob Agar wrote:


hi all

I want to project lat/long points from WGS84 to a custom coordinate  
system, but since it's a fairly obvious projection, perhaps it may  
already exist as an "official" spatial reference system.


Basically, what I want is:

- mercator projection
- origin at 0 E, 0 N
- positive x is east, positive y north
- unit in metres

Does this already exist in spatial_ref_sys, and if so, how would I  
go about finding it?


If it doesn't exist, how should I construct the srtext & proj4text  
strings to add to the table?


cheers
Rob
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users