[postgis-users] Geoprocessing & BigData

2016-01-18 Thread Ravi Pavuluri
Hi All,
I am checking if there is a way to process quickly large datasets such as 
census blocks in PostGIS and also by leveraging big data platform. I have few 
questions in this regard.

1) When I try intersect for sample census blocks with another polygon layer, 
PostGIS 2.2(on Postgres 9.4) takes ~60 minutes (after optimizing from 
http://postgis.net/2014/03/14/tip_intersection_faster/ ) while on  ESRI ArcMap 
takes ~10 minutes. PostGIS layers already have geospatial indices. Is there 
anyway to optimize this further?
2) What is an equivalent of ESRI Union in PostGIS? I didn't see any out of the 
box functions and any tips here are appreciated.3) Is there anyway we can 
expedite these geoprocessing tasks(union/intersect etc) using big data platform 
(Ex: hadoop)? Most examples talk about analysis (contains etc)  but not about 
geoprocessing on geospatial data. Any input is appreciated.

Thanks,Ravi.___
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/postgis-users

Re: [postgis-users] Geoprocessing & BigData

2016-01-18 Thread Vincent Picavet (ml)
Hi Ravi,




On 18/01/2016 19:14, Ravi Pavuluri wrote:
> Hi All,
> 
> I am checking if there is a way to process quickly large datasets such
> as census blocks in PostGIS and also by leveraging big data platform. I
> have few questions in this regard.
> 
> 1) When I try intersect for sample census blocks with another polygon
> layer, PostGIS 2.2(on Postgres 9.4) takes ~60 minutes (after optimizing
> from http://postgis.net/2014/03/14/tip_intersection_faster/ ) while on 
> ESRI ArcMap takes ~10 minutes. PostGIS layers already have geospatial
> indices. Is there anyway to optimize this further?

Following the links on your page, here is a good answer from Paul (TL;DR
: st_intersection is slow, avoid it) :
http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562

> 2) What is an equivalent of ESRI Union in PostGIS? I didn't see any out
> of the box functions and any tips here are appreciated.

If ESRI Union makes a union, maybe st_union ? But I guess there are some
semantic issues here.

> 3) Is there anyway we can expedite these geoprocessing
> tasks(union/intersect etc) using big data platform (Ex: hadoop)? Most
> examples talk about analysis (contains etc)  but not about geoprocessing
> on geospatial data. Any input is appreciated.

Lots of people do geoprocessing too with PostGIS, including long-running
jobs on large volumes of data ( worldwide osm data processing namely).
"Big data" is a really subjective word. Are your geoprocessing needs
really parallelizable ? What kind of volumes are we talking about ? MB,
GB, TB ? What kind of hardware do you have at hand ?

One way to do some sort of map-reduce with PostGIS is to use a bunch of
servers with FDW connections between a source master and these slaves,
map the data processing to the slave servers and reduce it on the main
server. With a bit of Python as glue code this can be automated and
quite efficient, even though this kind of sharding is not automated (
yet ?).

Vincent

> 
> Thanks,
> Ravi.
> 
> 
> ___
> postgis-users mailing list
> postgis-users@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
> 

___
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/postgis-users

Re: [postgis-users] Geoprocessing & BigData

2016-01-18 Thread Rémi Cura
Hey,
if you have one beefy server you can parallelize throwing several queries
working on sub set of your data.
(aka parallel processing trough data partition).
One conceptual example : you want to process the world, you create 20
workers, a list of countries, and then make the worker process the list
country by country.

If you think one postgres server will not be sufficient,
you could of course shard your data across several servers,
with options ranging from writting from scratch (you rewrite everything),
to using existing open source code, to dedicated solution like
 Postgresql-Xc, greenplum, ...

However, sorry to say this but in your case it looks like your first
improvement step will not come from massive paralleling but from first
better understanding the world of geospatial data and postgis.

Cheers,
Rémi-C

2016-01-18 19:30 GMT+01:00 Vincent Picavet (ml) :

> Hi Ravi,
>
>
>
>
> On 18/01/2016 19:14, Ravi Pavuluri wrote:
> > Hi All,
> >
> > I am checking if there is a way to process quickly large datasets such
> > as census blocks in PostGIS and also by leveraging big data platform. I
> > have few questions in this regard.
> >
> > 1) When I try intersect for sample census blocks with another polygon
> > layer, PostGIS 2.2(on Postgres 9.4) takes ~60 minutes (after optimizing
> > from http://postgis.net/2014/03/14/tip_intersection_faster/ ) while on
> > ESRI ArcMap takes ~10 minutes. PostGIS layers already have geospatial
> > indices. Is there anyway to optimize this further?
>
> Following the links on your page, here is a good answer from Paul (TL;DR
> : st_intersection is slow, avoid it) :
>
> http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562
>
> > 2) What is an equivalent of ESRI Union in PostGIS? I didn't see any out
> > of the box functions and any tips here are appreciated.
>
> If ESRI Union makes a union, maybe st_union ? But I guess there are some
> semantic issues here.
>
> > 3) Is there anyway we can expedite these geoprocessing
> > tasks(union/intersect etc) using big data platform (Ex: hadoop)? Most
> > examples talk about analysis (contains etc)  but not about geoprocessing
> > on geospatial data. Any input is appreciated.
>
> Lots of people do geoprocessing too with PostGIS, including long-running
> jobs on large volumes of data ( worldwide osm data processing namely).
> "Big data" is a really subjective word. Are your geoprocessing needs
> really parallelizable ? What kind of volumes are we talking about ? MB,
> GB, TB ? What kind of hardware do you have at hand ?
>
> One way to do some sort of map-reduce with PostGIS is to use a bunch of
> servers with FDW connections between a source master and these slaves,
> map the data processing to the slave servers and reduce it on the main
> server. With a bit of Python as glue code this can be automated and
> quite efficient, even though this kind of sharding is not automated (
> yet ?).
>
> Vincent
>
> >
> > Thanks,
> > Ravi.
> >
> >
> > ___
> > postgis-users mailing list
> > postgis-users@lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/postgis-users
> >
>
> ___
> postgis-users mailing list
> postgis-users@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
___
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/postgis-users

[postgis-users] Getting unpackaged PostGIS 2.0.7 into an extension?

2016-01-18 Thread Tom Watson
We have a 750GB production database that was originally created on PostgreSQL 
9.0 and has an unpackaged implementation of PostGIS 2.0.7. We’re now on 
PostgreSQL 9.4, and we’re still running the unpackaged PostGIS 2.0.7. We need 
to get to a packaged (extensions) implementation of PostGIS 2.1.8. We have 
tried every conceivable approach using the sql scripts in the 
…/share/extensions directories for PostgreSQL 9.1 and 9.4, but so far no 
success.

We would very much appreciate advice on how to accomplish this using an 
approach that doesn’t require a dump and restore. Our downtime window is only 
15 minutes, whereas a dump and restore would take nearly 24 hours.

Thanks in advance.
Tom

Tom
---
___
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/postgis-users

[postgis-users] Upcoming Conferences looking for speakers

2016-01-18 Thread Regina Obe
There are two conferences coming up which are looking for speakers and
should have PostGIS content. They are the following:

1) FOSS4G NA 2016 - Raleigh, North Carolina May 2-5th, 2016
https://2016.foss4g-na.org/ 
Deadline for Early bird submissions - January 22nd, 2016
Final Deadline for submissions - February 8th, 2016


2) PGConf US 2016 NYC  -  http://www.pgconf.us/2016/ (Will be in Brooklyn,
NY) - Deadline is January 31st, 2016


Thanks,
Regina
http://www.postgis.us
http://postgis.net




___
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/postgis-users

Re: [postgis-users] Geoprocessing & BigData

2016-01-18 Thread Ravi Pavuluri
Vincent and Remi,

Thank you both for your inputs. I have combined two things in one thread. 
Parallelization is a secondary need and I will look into "Postgresql-Xc, 
Greenplum or custom code approach". 

Regarding the PostGIS  performance on intersecting geometries, I am not able to 
see any improvement. I am looking at intersection because of my use case. (Ex: 
What % of census blocks fall in Zone A, Zone B, Zone C etc. flood zones from 
Flood Zones Layer). If intersect is to avoided, can this be achieved through 
another way?


@Vincent : For ArcGIS Union, please see here. 
http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/analysis_tools/union_analysis_.htm


Any inputs are appreciated. 

Thanks again,
Ravi.


On Mon, 1/18/16, Rémi Cura  wrote:

 Subject: Re: [postgis-users] Geoprocessing & BigData
 To: vincent...@oslandia.com, "PostGIS Users Discussion" 

 Cc: "Ravi Pavuluri" 
 Date: Monday, January 18, 2016, 2:51 PM
 
 Hey,
 if you have one
 beefy server you can parallelize throwing several queries
 working on sub set of your data.
 (aka parallel
 processing trough data partition).
 One conceptual
 example : you want to process the world, you create 20
 workers, a list of countries, and then make the worker
 process the list country by country.
 
 If you think one
 postgres server will not be sufficient,
 you
 could of course shard your data across several servers, 
 with options ranging from writting from scratch
 (you rewrite everything),
 to using existing
 open source code, to dedicated solution like
  Postgresql-Xc, greenplum, ...
 
 However, sorry to
 say this but in your case it looks like your first
 improvement step will not come from massive paralleling but
 from first better understanding the world of geospatial data
 and postgis.
 
 Cheers,
 Rémi-C
 
 2016-01-18 19:30 GMT+01:00
 Vincent Picavet (ml) :
 Hi Ravi,
 
 
 
 
 
 
 
 
 
 On 18/01/2016 19:14, Ravi Pavuluri wrote:
 
 > Hi All,
 
 >
 
 > I am checking if there is a way to process quickly
 large datasets such
 
 > as census blocks in PostGIS and also by leveraging big
 data platform. I
 
 > have few questions in this regard.
 
 >
 
 > 1) When I try intersect for sample census blocks with
 another polygon
 
 > layer, PostGIS 2.2(on Postgres 9.4) takes ~60 minutes
 (after optimizing
 
 > from http://postgis.net/2014/03/14/tip_intersection_faster/
 ) while on
 
 > ESRI ArcMap takes ~10 minutes. PostGIS layers already
 have geospatial
 
 > indices. Is there anyway to optimize this further?
 
 
 
 Following the links on your page, here is a good answer from
 Paul (TL;DR
 
 : st_intersection is slow, avoid it) :
 
 
http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562
 
 
 
 > 2) What is an equivalent of ESRI Union in PostGIS? I
 didn't see any out
 
 > of the box functions and any tips here are
 appreciated.
 
 
 
 If ESRI Union makes a union, maybe st_union ? But I guess
 there are some
 
 semantic issues here.
 
 
 
 > 3) Is there anyway we can expedite these
 geoprocessing
 
 > tasks(union/intersect etc) using big data platform (Ex:
 hadoop)? Most
 
 > examples talk about analysis (contains etc)  but not
 about geoprocessing
 
 > on geospatial data. Any input is appreciated.
 
 
 
 Lots of people do geoprocessing too with PostGIS, including
 long-running
 
 jobs on large volumes of data ( worldwide osm data
 processing namely).
 
 "Big data" is a really subjective word. Are your
 geoprocessing needs
 
 really parallelizable ? What kind of volumes are we talking
 about ? MB,
 
 GB, TB ? What kind of hardware do you have at hand ?
 
 
 
 One way to do some sort of map-reduce with PostGIS is to use
 a bunch of
 
 servers with FDW connections between a source master and
 these slaves,
 
 map the data processing to the slave servers and reduce it
 on the main
 
 server. With a bit of Python as glue code this can be
 automated and
 
 quite efficient, even though this kind of sharding is not
 automated (
 
 yet ?).
 
 
 
 Vincent
 
 
 
 >
 
 > Thanks,
 
 > Ravi.
 
 >
 
 >
 
 > ___
 
 > postgis-users mailing list
 
 > postgis-users@lists.osgeo.org
 
 > http://lists.osgeo.org/mailman/listinfo/postgis-users
 
 >
 
 
 
 ___
 
 postgis-users mailing list
 
 postgis-users@lists.osgeo.org
 
 http://lists.osgeo.org/mailman/listinfo/postgis-users
___
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/postgis-users