RE: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management

2008-02-21 Thread Bruce . Bannerman
IMO:

Thanks for the comments Michael,

I was wondering if you'd contribute;-)



(Also, note that wavelet does
 not necessarily imply lossy anymore, as many assume.  Story of my
 life.)
 


Can you point me to any studies to support the claim that JPEG2000 can 
indeed be indeed non-lossy?


I've seen the claims over the years, but nothing to support it (not that 
I've actively gone looking for the info, as I haven't had the need).


Bruce








Notice:
This email and any attachments may contain information that is personal, 
confidential, legally privileged and/or copyright.No part of it should be 
reproduced, 
adapted or communicated without the prior written consent of the copyright 
owner. 

It is the responsibility of the recipient to check for and remove viruses.
If you have received this email in error, please notify the sender by return 
email, delete 
it from your system and destroy any copies. You are not authorised to use, 
communicate or rely on the information 
contained in this email.

Please consider the environment before printing this email.
___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


Re: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management

2008-02-20 Thread Arnulf Christl
, does anyone know of a good open source spatial solution for 
storing and accessing (multi and hyperspectral) imagery in a 
database?;-)


WMS 1.3 and WCS are showing promise for serving imagery, including 
multi and hyperspectral data.




Bruce Bannerman





[EMAIL PROTECTED] wrote on 20/02/2008 10:09:28 AM:

  Hi Ivan,
 
 The most common advice I've seen says to leave raster out of the 
DB.
  Of course footprints and meta data could be there, but you would 
want to
  point Geoserver coverage to the image/image pyramid url somewhere 
in the

  directory hierarchy.
 
  Brent has a nice writeup here:
  http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data
 
  In an AWS sense my idea is to Java proxy the Geoserver Coverage 
Data URL to
  S3 buckets and park the imagery over on the S3 side to take 
advantage of
  stability and replication. Performance, though, might not be as 
good as a
  local directory. Maybe a one time cache to a local directory would 
work

  better.
 
  Note: Amazon doesn't charge for inside AWS data transfers.
 
  So in theory:
PostGIS holds the footprint geometry + metadata
EC2 Geoserver WFS handles footprint queries into an Svg/Xaml 
client, just
  stick it on top of something like JPL BMNG. Once a user picks a 
coverage
  switch to the Geoserver WMS/WCS service for zooming around in the 
selected

  image pyramid
S3 buckets contain the tiffs, pyramids ...
EC2 Geoserver handles WMS/WCS service
EC2 proxy pulls the imagery from the S3 side as needed
 
  Sorry I haven't had time to try this so it is just theoretical. Of 
course
  you can go traditional and just keep the coverage imagery files on 
the local
  instance avoiding the S3 proxy idea. The reason I don't like that 
idea is

  the imagery has to be loaded with every instance creation while an S3
  approach would need only one copy.
 
 
  randy
 
  -Original Message-
  From: Lucena, Ivan [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, February 19, 2008 2:59 PM
  To: [EMAIL PROTECTED]; OSGeo Discussions
  Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
 
  Hi Randy, Bruce,
 
  That is a nice piece of advise Randy. I am sorry to intrude the
  conversation but I would like to ask how that heavy raster
  manipulation would be treated by PostgreSQL/PostGIS, managed or 
unmanaged?

 
  Best regards,
 
  Ivan
 
  Randy George wrote:
   Hi Bruce,
  
  
   On the scale relatively quickly front, you 
should look
   at Amazon's EC2/S3 services. I've recently worked with it and 
find it an

   attractive platform for scaling http://www.cadmaps.com/gisblog
  
  
   The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk
   Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
  
  
   If you use the larger instances the cost is 
higher but
   it sounds like you plan on some heavy raster services (WMS,WCS) 
and lots

   of memory will help.
  
   Small EC2 instance provides $0.10/hr:
  
   1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 
Compute

   Unit), 160 GB of instance storage, 32-bit platform
  
  
   Large EC2 instances provide $0.40/hr:
  
   7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2
   Compute Units each), 850 GB of instance storage, 64-bit platform
  
  
   Extra large EC2 instances $0.80/hr:
  
   15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 
Compute

   Units each), 1690 GB of instance storage, 64-bit platform
  
  
   Note: that the instances do not need to be permanent. Some people
   (WeoGeo) have been using a couple of failover small instances and 
then
   starting new large instances for specific requirements. The idea 
is to

   start and stop instances as required rather than having ongoing
   infrastructure costs. It only takes a minute or so to start an ec2
   instance. If you are running a corporate service there may be 
parts of

   the day with very little use so you just schedule your heavy duty
   instances for peak times. If you can connect your raster to S3 
buckets

   rather than instance storage you have built in replicated backup.
  
  
   I know that Java JAI can easily eat up memory and is core to 
Geoserver
   WMS/WCS so you probably want to look at large memory footprint 
for any
   platform with lots of raster service. I'm partial to Geoserver 
because
   of its Java foundation.  I think I would try to keep the Apache2 
mod_jk
   Tomcat Geoserver on a separate server instance from PostGIS. This 
might
   avoid problems for instance startup since your database would 
need to be

   loaded separately. The instance ami resides in a 10G partition the
   balance of data will probably reside on a /mnt partition separate 
from
   ec2-run-instances. You may be able to avoid datadir problems by 
adding

   something like Elastra to the mix. Elastra beta is a wrapper for
   PostgreSql that puts the datadir on S3 rather

RE: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management

2008-02-20 Thread Michael P. Gerlek
Interesting thread...  A couple points from the sidelines:

My company sells a store-your-images-in-a-database product, for storing
JPEG 2000 and MrSID imagery; there are indeed people who see value in
using a DB to manage their raster assets.

Our product is not open source, but when using it with JPEG 2000 images
it *is* designed around the appropriate JP2 standards for storing the
data.  Very loosely speaking, it uses JP2's internal tiling scheme and
each such tile is stored as a blob.  Each band can be stored separately,
for the sort of workflows you describe.  (Also, note that wavelet does
not necessarily imply lossy anymore, as many assume.  Story of my
life.)

The whole
how-do-I-do-an-image-processing-workflows-across-a-chain-of-servers
thing keeps me up at night, esp. the points you bring up below (eek,
tile boundaries!).  I think WPS is moving slower for a few reasons:
OGC specs proceed at a deliberate pace, there are (relatively) few
people involved in WPS, and -- most importantly to me -- the workflows
are still not well understood enough to have a critical mass of people
pushing for a baseline functionality set.

-mpg

 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Randy George
 Sent: Wednesday, February 20, 2008 7:38 AM
 To: 'OSGeo Discussions'
 Subject: RE: [OSGeo-Discuss] OS Spatial environment 'sizing' 
 + Image Management
 
 Hi Ivan and Bruce,
 
   Interesting, other than using JAI a bit on Space 
 Imaging data (this
 was awhile back) I have been mostly using vectors.
 
   I am curious to know what advantage an arcSDE/Oracle stack would
 provide on image storage. I had understood imagery was simply 
 stored as
 large blob fields and streamed in and out of the DB where it is
 processed/viewed etc. The original state I had understood was 
 unchanged
 (lossy, wavelet, pk or otherwise happening outside the DB), 
 just residing in
 the DB directory rather than the disk hierarchy. Other than 
 possible table
 corruption issues I imagined that the overhead for streaming 
 a blob into an
 image object was the only real concern on DB storage.
 
 But I'm getting the idea that something a bit more is going 
 on. Does the
 image actually get retiled (celled) and then stored in 
 multiple fields? Is a
 multispectral broken into bands first before storing in 
 separate fields?
 CHIP sounds more like an additional database function to 
 optimize chipping
 inside a DBTable so that an entire image doesn't have to be 
 read just to
 grab a small viewbox. Does arcSDE add similar functions to 
 the base DB or
 does it just grab out an image and chip, threshold, 
 convolute, histogram,
 etc after the fact?
 
 I'm just curious since I've been fascinated with the prospects of
 hyperspectral imagery.
 
 From an AWS perspective very large imagery would need some 
 type of tiling
 since there is a 5Gb limit on S3 objects. Larger objects are 
 typically tar
 gzipped and split before storage. It is hard to imagine a 
 tiling scheme that
 large anyway. For example Google's Digital Globe tiling pyramid uses
 miniscule tiles at 256x256 compressed to approx 18kb/tile
 http://kh.google.com/kh?v=3t=trtsqtqsqqqt
 http://www.cadmaps.com/gisblog/?p=7
 
 From a web perspective analysis could proceed along a highly 
 tiled approach.
 So the original 70Gb image becomes a tiled pyramid with the 
 browser view
 changing position inside the image pyramid. Small patches 
 flow in and out of
 the view with each zoom and pan. Analysis, WPS, adds some 
 complexity since
 things like convolution algorithms need to be rewritten to 
 take into account
 tile boundaries. Or, alternatively the viewbox is re-mosaiced 
 before running
 a server side convolution that is subsequently streamed back 
 to the browser
 view, not extremely fast. Hyper-spectral bands would reside 
 in separate tile
 pyramids so that Boolean layer operations could proceed 
 server side for
 viewing at the browser. Analysis really can't take advantage 
 of predigested
 read only schemes like Google's since the whole point is to create new
 images from combinations of image bands. Consequently WPS 
 seems to be moving
 slower than WMS, WCS, WFS
 
 Thanks
 Randy


[snip]
___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


RE: [OSGeo-Discuss] OS Spatial environment 'sizing'

2008-02-19 Thread Randy George
Hi Bruce,

 

On the scale relatively quickly front, you should look at
Amazon's EC2/S3 services. I've recently worked with it and find it an
attractive platform for scaling http://www.cadmaps.com/gisblog

 

The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk Tomcat
+ Geoserver + custom SVG or XAML clients run out of Tomcat 

 

If you use the larger instances the cost is higher but it
sounds like you plan on some heavy raster services (WMS,WCS) and lots of
memory will help.

Small EC2 instance provides $0.10/hr:

1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute
Unit), 160 GB of instance storage, 32-bit platform

 

Large EC2 instances provide $0.40/hr:

7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute
Units each), 850 GB of instance storage, 64-bit platform

 

Extra large EC2 instances $0.80/hr:

15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute
Units each), 1690 GB of instance storage, 64-bit platform

 

Note: that the instances do not need to be permanent. Some people (WeoGeo)
have been using a couple of failover small instances and then starting new
large instances for specific requirements. The idea is to start and stop
instances as required rather than having ongoing infrastructure costs. It
only takes a minute or so to start an ec2 instance. If you are running a
corporate service there may be parts of the day with very little use so you
just schedule your heavy duty instances for peak times. If you can connect
your raster to S3 buckets rather than instance storage you have built in
replicated backup.

 

I know that Java JAI can easily eat up memory and is core to Geoserver
WMS/WCS so you probably want to look at large memory footprint for any
platform with lots of raster service. I'm partial to Geoserver because of
its Java foundation.  I think I would try to keep the Apache2 mod_jk Tomcat
Geoserver on a separate server instance from PostGIS. This might avoid
problems for instance startup since your database would need to be loaded
separately. The instance ami resides in a 10G partition the balance of data
will probably reside on a /mnt partition separate from ec2-run-instances.
You may be able to avoid datadir problems by adding something like Elastra
to the mix. Elastra beta is a wrapper for PostgreSql that puts the datadir
on S3 rather than local to an instance. I suppose they still keep
indices(GIST et al) on the local instance. 

(I still think it an interesting exercise to see what could be done
connecting PostGIS to AWS SimpleDB services.)

 

So thinking out loud here is a possible architecture- 

Basic permanent setup

put raster in S3 - this may require some customization of Geoserver, 

build a datadir in a PostGIS and backup to S3

create a private ami for Postgresql/PostGIS

create a private ami for the load balancer instance

create a private ami with your service stack for both a small and large
instance for flexibility, 

   Startup services

start a balancer instance

point your DNS CNAME to this balancer instance

start a PostGis instance (you could have more than one if necessary but it
would be easier to just scale to a larger instance type if the load demands
it)

have a scripted download from an S3 BU to your PostGIS datadir (I'm assuming
a relatively static data resource)

   Variable services

start service stack instance and connect to PostGIS

update balancer to see new instance - this could be tricky

repeat previous  two steps as needed 

at night scale back - cron scaling for a known cycle or use a controller
like weoceo to detect and respond to load fluctuation

 

By the way the public AWS ami with the best resources that I have found is
Ubuntu 7.10 Gutsy. The debian dependency tools are much easier to use and
the resources are plentiful.

 

I've been toying with using an AWS stack adapted for serving some larger
Postgis vector sets such as fully connected census demographic data and
block polygons here in US. The idea would be to populate the data directly
from the census SF* and TIGER with a background Java bot. There are some
potentially novel 3D viewing approaches possible with xaml. Anyway lots of
fun to have access to virtual systems like this. 

 

As you can see I'm excited anyway.

 

randy

 

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Monday, February 18, 2008 6:35 PM
To: OSGeo Discussions
Subject: [OSGeo-Discuss] OS Spatial environment 'sizing'

 


IMO: 


Hello everyone, 

I'm trying to get a feel for server 'sizing' for a **hypothetical**
Corporate environment to support OS Spatial apps. 



Assume that: 

- this is a dedicated environment to allow the use of OS Spatial
applications to serve Corporate OGC Services. 

- the applications of interest are GeoServer, Deegree, GeoNetwork,
MapServer, MapGuide and Postgres/PostGIS. 

- the environment may need to scale relatively quickly. 

Re: [OSGeo-Discuss] OS Spatial environment 'sizing'

2008-02-19 Thread Cameron Shorter

Randy, what an informative email.
It is almost a Howto for OSGeo hardware and performance tuning. I'm 
not aware of anyone who has written something similar (although I admit 
I have not looked).


I'd love to see it incorporated into an easily referenced resource - 
maybe a chapter in

http://wiki.osgeo.org/index.php/Educational_Content_Inventory

Also, a link from http://wiki.osgeo.org/index.php/Case_Studies .

What do you think?

Randy George wrote:


Hi Bruce,

On the “scale relatively quickly” front, you should look at Amazon’s 
EC2/S3 services. I’ve recently worked with it and find it an 
attractive platform for scaling http://www.cadmaps.com/gisblog


The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk 
Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat


If you use the larger instances the cost is higher but it sounds like 
you plan on some heavy raster services (WMS,WCS) and lots of memory 
will help.


Small EC2 instance provides $0.10/hr:

1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 
Compute Unit), 160 GB of instance storage, 32-bit platform


Large EC2 instances provide $0.40/hr:

7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 
Compute Units each), 850 GB of instance storage, 64-bit platform


Extra large EC2 instances $0.80/hr:

15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 
Compute Units each), 1690 GB of instance storage, 64-bit platform


Note: that the instances do not need to be permanent. Some people 
(WeoGeo) have been using a couple of failover small instances and then 
starting new large instances for specific requirements. The idea is to 
start and stop instances as required rather than having ongoing 
infrastructure costs. It only takes a minute or so to start an ec2 
instance. If you are running a corporate service there may be parts of 
the day with very little use so you just schedule your heavy duty 
instances for peak times. If you can connect your raster to S3 buckets 
rather than instance storage you have built in replicated backup.


I know that Java JAI can easily eat up memory and is core to Geoserver 
WMS/WCS so you probably want to look at large memory footprint for any 
platform with lots of raster service. I’m partial to Geoserver because 
of its Java foundation. I think I would try to keep the Apache2 mod_jk 
Tomcat Geoserver on a separate server instance from PostGIS. This 
might avoid problems for instance startup since your database would 
need to be loaded separately. The instance ami resides in a 10G 
partition the balance of data will probably reside on a /mnt partition 
separate from ec2-run-instances. You may be able to avoid datadir 
problems by adding something like Elastra to the mix. Elastra beta is 
a wrapper for PostgreSql that puts the datadir on S3 rather than local 
to an instance. I suppose they still keep indices(GIST et al) on the 
local instance.


(I still think it an interesting exercise to see what could be done 
connecting PostGIS to AWS SimpleDB services.)


So thinking out loud here is a possible architecture–

Basic permanent setup

put raster in S3 – this may require some customization of Geoserver,

build a datadir in a PostGIS and backup to S3

create a private ami for Postgresql/PostGIS

create a private ami for the load balancer instance

create a private ami with your service stack for both a small and 
large instance for flexibility,


Startup services

start a balancer instance

point your DNS CNAME to this balancer instance

start a PostGis instance (you could have more than one if necessary 
but it would be easier to just scale to a larger instance type if the 
load demands it)


have a scripted download from an S3 BU to your PostGIS datadir (I’m 
assuming a relatively static data resource)


Variable services

start service stack instance and connect to PostGIS

update balancer to see new instance – this could be tricky

repeat previous two steps as needed

at night scale back – cron scaling for a known cycle or use a 
controller like weoceo to detect and respond to load fluctuation


By the way the public AWS ami with the best resources that I have 
found is Ubuntu 7.10 Gutsy. The debian dependency tools are much 
easier to use and the resources are plentiful.


I’ve been toying with using an AWS stack adapted for serving some 
larger Postgis vector sets such as fully connected census demographic 
data and block polygons here in US. The idea would be to populate the 
data directly from the census SF* and TIGER with a background Java 
bot. There are some potentially novel 3D viewing approaches possible 
with xaml. Anyway lots of fun to have access to virtual systems like 
this.


As you can see I’m excited anyway.

randy

*From:* [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] *On Behalf Of 
[EMAIL PROTECTED]

*Sent:* Monday, February 18, 2008 6:35 PM
*To:* OSGeo Discussions
*Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'


IMO:



Re: [OSGeo-Discuss] OS Spatial environment 'sizing'

2008-02-19 Thread Lucena, Ivan

Hi Randy, Bruce,

That is a nice piece of advise Randy. I am sorry to intrude the 
conversation but I would like to ask how that heavy raster 
manipulation would be treated by PostgreSQL/PostGIS, managed or unmanaged?


Best regards,

Ivan

Randy George wrote:

Hi Bruce,

 

On the “scale relatively quickly” front, you should look 
at Amazon’s EC2/S3 services. I’ve recently worked with it and find it an 
attractive platform for scaling http://www.cadmaps.com/gisblog


 

The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk 
Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat


 

If you use the larger instances the cost is higher but 
it sounds like you plan on some heavy raster services (WMS,WCS) and lots 
of memory will help.


Small EC2 instance provides $0.10/hr:

1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute 
Unit), 160 GB of instance storage, 32-bit platform


 


Large EC2 instances provide $0.40/hr:

7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 
Compute Units each), 850 GB of instance storage, 64-bit platform


 


Extra large EC2 instances $0.80/hr:

15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute 
Units each), 1690 GB of instance storage, 64-bit platform


 

Note: that the instances do not need to be permanent. Some people 
(WeoGeo) have been using a couple of failover small instances and then 
starting new large instances for specific requirements. The idea is to 
start and stop instances as required rather than having ongoing 
infrastructure costs. It only takes a minute or so to start an ec2 
instance. If you are running a corporate service there may be parts of 
the day with very little use so you just schedule your heavy duty 
instances for peak times. If you can connect your raster to S3 buckets 
rather than instance storage you have built in replicated backup.


 

I know that Java JAI can easily eat up memory and is core to Geoserver 
WMS/WCS so you probably want to look at large memory footprint for any 
platform with lots of raster service. I’m partial to Geoserver because 
of its Java foundation.  I think I would try to keep the Apache2 mod_jk 
Tomcat Geoserver on a separate server instance from PostGIS. This might 
avoid problems for instance startup since your database would need to be 
loaded separately. The instance ami resides in a 10G partition the 
balance of data will probably reside on a /mnt partition separate from 
ec2-run-instances. You may be able to avoid datadir problems by adding 
something like Elastra to the mix. Elastra beta is a wrapper for 
PostgreSql that puts the datadir on S3 rather than local to an instance. 
I suppose they still keep indices(GIST et al) on the local instance.


(I still think it an interesting exercise to see what could be done 
connecting PostGIS to AWS SimpleDB services.)


 


So thinking out loud here is a possible architecture–

Basic permanent setup

put raster in S3 – this may require some customization of Geoserver,

build a datadir in a PostGIS and backup to S3

create a private ami for Postgresql/PostGIS

create a private ami for the load balancer instance

create a private ami with your service stack for both a small and large 
instance for flexibility,


   Startup services

start a balancer instance

point your DNS CNAME to this balancer instance

start a PostGis instance (you could have more than one if necessary but 
it would be easier to just scale to a larger instance type if the load 
demands it)


have a scripted download from an S3 BU to your PostGIS datadir (I’m 
assuming a relatively static data resource)


   Variable services

start service stack instance and connect to PostGIS

update balancer to see new instance – this could be tricky

repeat previous  two steps as needed

at night scale back – cron scaling for a known cycle or use a controller 
like weoceo to detect and respond to load fluctuation


 

By the way the public AWS ami with the best resources that I have found 
is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier to use 
and the resources are plentiful.


 

I’ve been toying with using an AWS stack adapted for serving some larger 
Postgis vector sets such as fully connected census demographic data and 
block polygons here in US. The idea would be to populate the data 
directly from the census SF* and TIGER with a background Java bot. There 
are some potentially novel 3D viewing approaches possible with xaml. 
Anyway lots of fun to have access to virtual systems like this.


 


As you can see I’m excited anyway.

 


randy

 

 

*From:* [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] *On Behalf Of 
[EMAIL PROTECTED]

*Sent:* Monday, February 18, 2008 6:35 PM
*To:* OSGeo Discussions
*Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'

 



IMO:


Hello everyone,

I'm trying to get a feel for server 'sizing' for a **hypothetical** 
Corporate 

RE: [OSGeo-Discuss] OS Spatial environment 'sizing'

2008-02-19 Thread Randy George
Hi Ivan,

The most common advice I've seen says to leave raster out of the DB.
Of course footprints and meta data could be there, but you would want to
point Geoserver coverage to the image/image pyramid url somewhere in the
directory hierarchy.

Brent has a nice writeup here:
http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data

In an AWS sense my idea is to Java proxy the Geoserver Coverage Data URL to
S3 buckets and park the imagery over on the S3 side to take advantage of
stability and replication. Performance, though, might not be as good as a
local directory. Maybe a one time cache to a local directory would work
better.

Note: Amazon doesn't charge for inside AWS data transfers.

So in theory:
  PostGIS holds the footprint geometry + metadata
  EC2 Geoserver WFS handles footprint queries into an Svg/Xaml client, just
stick it on top of something like JPL BMNG. Once a user picks a coverage
switch to the Geoserver WMS/WCS service for zooming around in the selected
image pyramid
  S3 buckets contain the tiffs, pyramids ...
  EC2 Geoserver handles WMS/WCS service
  EC2 proxy pulls the imagery from the S3 side as needed

Sorry I haven't had time to try this so it is just theoretical. Of course
you can go traditional and just keep the coverage imagery files on the local
instance avoiding the S3 proxy idea. The reason I don't like that idea is
the imagery has to be loaded with every instance creation while an S3
approach would need only one copy.


randy

-Original Message-
From: Lucena, Ivan [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 19, 2008 2:59 PM
To: [EMAIL PROTECTED]; OSGeo Discussions
Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing'

Hi Randy, Bruce,

That is a nice piece of advise Randy. I am sorry to intrude the 
conversation but I would like to ask how that heavy raster 
manipulation would be treated by PostgreSQL/PostGIS, managed or unmanaged?

Best regards,

Ivan

Randy George wrote:
 Hi Bruce,
 
  
 
 On the scale relatively quickly front, you should look 
 at Amazon's EC2/S3 services. I've recently worked with it and find it an 
 attractive platform for scaling http://www.cadmaps.com/gisblog
 
  
 
 The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk 
 Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
 
  
 
 If you use the larger instances the cost is higher but 
 it sounds like you plan on some heavy raster services (WMS,WCS) and lots 
 of memory will help.
 
 Small EC2 instance provides $0.10/hr:
 
 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute 
 Unit), 160 GB of instance storage, 32-bit platform
 
  
 
 Large EC2 instances provide $0.40/hr:
 
 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 
 Compute Units each), 850 GB of instance storage, 64-bit platform
 
  
 
 Extra large EC2 instances $0.80/hr:
 
 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute 
 Units each), 1690 GB of instance storage, 64-bit platform
 
  
 
 Note: that the instances do not need to be permanent. Some people 
 (WeoGeo) have been using a couple of failover small instances and then 
 starting new large instances for specific requirements. The idea is to 
 start and stop instances as required rather than having ongoing 
 infrastructure costs. It only takes a minute or so to start an ec2 
 instance. If you are running a corporate service there may be parts of 
 the day with very little use so you just schedule your heavy duty 
 instances for peak times. If you can connect your raster to S3 buckets 
 rather than instance storage you have built in replicated backup.
 
  
 
 I know that Java JAI can easily eat up memory and is core to Geoserver 
 WMS/WCS so you probably want to look at large memory footprint for any 
 platform with lots of raster service. I'm partial to Geoserver because 
 of its Java foundation.  I think I would try to keep the Apache2 mod_jk 
 Tomcat Geoserver on a separate server instance from PostGIS. This might 
 avoid problems for instance startup since your database would need to be 
 loaded separately. The instance ami resides in a 10G partition the 
 balance of data will probably reside on a /mnt partition separate from 
 ec2-run-instances. You may be able to avoid datadir problems by adding 
 something like Elastra to the mix. Elastra beta is a wrapper for 
 PostgreSql that puts the datadir on S3 rather than local to an instance. 
 I suppose they still keep indices(GIST et al) on the local instance.
 
 (I still think it an interesting exercise to see what could be done 
 connecting PostGIS to AWS SimpleDB services.)
 
  
 
 So thinking out loud here is a possible architecture-
 
 Basic permanent setup
 
 put raster in S3 - this may require some customization of Geoserver,
 
 build a datadir in a PostGIS and backup to S3
 
 create a private ami for Postgresql/PostGIS
 
 create a private ami for the load

RE: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management

2008-02-19 Thread Bruce . Bannerman
IMO:


Hi Randy,

Thank you for your informative post. It has given me a lot to follow up on 
and think about.

I can see an immediate need that this type of solution could well be used 
for. I like it.

I suspect that in many larger corporate types of environments, it could 
well be used effectively for 'pilot' and 'pre-production' type tasks. 

For 'production' type environments, there would be issues of integrating 
an external service hosting spatial data with internal services hosting 
corporate aspatial data sources and applications.



with regards to storing imagery in a database:

rant   (and not directed at you)

I've also seen a lot of reports suggesting that image management should be 
file based.

My personal preference is to use a database if possible, so that I can 
take advantage of corporate data management facilities, backups, point in 
time restores etc.

I've managed 70 GB orthophoto mosaics in ArcSDE / Oracle before with 
minimal problems. I found performance and response times to be comparable 
with other image web server options on the market that use file based 
solutions for storing data.

Ideally, I'm looking to manage state wide mosaics with a consistant look 
and feel that can be treated as a single 'layer' by client GIS / Remote 
Sensing applications (data integrity issues allowing). 

One potential use is 'best available' data mosaics could undergo regular 
updates as more imagery is flown or captured. A database makes it easier 
to manage and deliver such data.

My definition of 'imagery' goes beyond aerial photographs and includes 
multi or hyper-spectral imagery; various geophysics data sources such as 
aeromagnetics, gravity, radiometrics; radar data etc.

Typically this data is required for digital image analysis purposes using 
a remote sensing application, so the integrity of 'the numbers' that make 
up the image is very important.

Many of today's image based solutions use a (lossy) wavelet compression 
that can corrupt the integrity of 'the numbers' describing the radiometric 
data in the image.

When we consider the big picture issues facing us today, such as Climate 
Change, I think that it is important to protect our definitive image 
libraries from such corruption as they will be invaluable sources of data 
for future multi-temporal analysis.

That said, if the end use is just for a picture, then a wavelet 
compression is a good option. Just protect the source data for future use.

/rant 


So, does anyone know of a good open source spatial solution for storing 
and accessing (multi and hyperspectral) imagery in a database?;-)

WMS 1.3 and WCS are showing promise for serving imagery, including multi 
and hyperspectral data.



Bruce Bannerman





[EMAIL PROTECTED] wrote on 20/02/2008 10:09:28 AM:

 Hi Ivan,
 
The most common advice I've seen says to leave raster out of the DB.
 Of course footprints and meta data could be there, but you would want to
 point Geoserver coverage to the image/image pyramid url somewhere in the
 directory hierarchy.
 
 Brent has a nice writeup here:
 http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data
 
 In an AWS sense my idea is to Java proxy the Geoserver Coverage Data URL 
to
 S3 buckets and park the imagery over on the S3 side to take advantage of
 stability and replication. Performance, though, might not be as good as 
a
 local directory. Maybe a one time cache to a local directory would work
 better.
 
 Note: Amazon doesn't charge for inside AWS data transfers.
 
 So in theory:
   PostGIS holds the footprint geometry + metadata
   EC2 Geoserver WFS handles footprint queries into an Svg/Xaml client, 
just
 stick it on top of something like JPL BMNG. Once a user picks a coverage
 switch to the Geoserver WMS/WCS service for zooming around in the 
selected
 image pyramid
   S3 buckets contain the tiffs, pyramids ...
   EC2 Geoserver handles WMS/WCS service
   EC2 proxy pulls the imagery from the S3 side as needed
 
 Sorry I haven't had time to try this so it is just theoretical. Of 
course
 you can go traditional and just keep the coverage imagery files on the 
local
 instance avoiding the S3 proxy idea. The reason I don't like that idea 
is
 the imagery has to be loaded with every instance creation while an S3
 approach would need only one copy.
 
 
 randy
 
 -Original Message-
 From: Lucena, Ivan [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 19, 2008 2:59 PM
 To: [EMAIL PROTECTED]; OSGeo Discussions
 Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
 
 Hi Randy, Bruce,
 
 That is a nice piece of advise Randy. I am sorry to intrude the 
 conversation but I would like to ask how that heavy raster 
 manipulation would be treated by PostgreSQL/PostGIS, managed or 
unmanaged?
 
 Best regards,
 
 Ivan
 
 Randy George wrote:
  Hi Bruce,
  
  
  
  On the scale relatively quickly front, you should 
look 
  at Amazon's EC2/S3 services. I've recently worked

Re: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management

2008-02-19 Thread Lucena, Ivan
 as needed
 
  Sorry I haven't had time to try this so it is just theoretical. Of course
  you can go traditional and just keep the coverage imagery files on 
the local

  instance avoiding the S3 proxy idea. The reason I don't like that idea is
  the imagery has to be loaded with every instance creation while an S3
  approach would need only one copy.
 
 
  randy
 
  -Original Message-
  From: Lucena, Ivan [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, February 19, 2008 2:59 PM
  To: [EMAIL PROTECTED]; OSGeo Discussions
  Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
 
  Hi Randy, Bruce,
 
  That is a nice piece of advise Randy. I am sorry to intrude the
  conversation but I would like to ask how that heavy raster
  manipulation would be treated by PostgreSQL/PostGIS, managed or 
unmanaged?

 
  Best regards,
 
  Ivan
 
  Randy George wrote:
   Hi Bruce,
  

  
   On the scale relatively quickly front, you should 
look
   at Amazon's EC2/S3 services. I've recently worked with it and find 
it an

   attractive platform for scaling http://www.cadmaps.com/gisblog
  

  

   The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk
   Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
  

  

   If you use the larger instances the cost is higher but
   it sounds like you plan on some heavy raster services (WMS,WCS) and 
lots

   of memory will help.
  
   Small EC2 instance provides $0.10/hr:
  
   1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 
Compute

   Unit), 160 GB of instance storage, 32-bit platform
  

  

   Large EC2 instances provide $0.40/hr:
  
   7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2
   Compute Units each), 850 GB of instance storage, 64-bit platform
  

  

   Extra large EC2 instances $0.80/hr:
  
   15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 
Compute

   Units each), 1690 GB of instance storage, 64-bit platform
  

  

   Note: that the instances do not need to be permanent. Some people
   (WeoGeo) have been using a couple of failover small instances and then
   starting new large instances for specific requirements. The idea is to
   start and stop instances as required rather than having ongoing
   infrastructure costs. It only takes a minute or so to start an ec2
   instance. If you are running a corporate service there may be parts of
   the day with very little use so you just schedule your heavy duty
   instances for peak times. If you can connect your raster to S3 buckets
   rather than instance storage you have built in replicated backup.
  

  

   I know that Java JAI can easily eat up memory and is core to Geoserver
   WMS/WCS so you probably want to look at large memory footprint for any
   platform with lots of raster service. I'm partial to Geoserver because
   of its Java foundation.  I think I would try to keep the Apache2 
mod_jk
   Tomcat Geoserver on a separate server instance from PostGIS. This 
might
   avoid problems for instance startup since your database would need 
to be

   loaded separately. The instance ami resides in a 10G partition the
   balance of data will probably reside on a /mnt partition separate from
   ec2-run-instances. You may be able to avoid datadir problems by adding
   something like Elastra to the mix. Elastra beta is a wrapper for
   PostgreSql that puts the datadir on S3 rather than local to an 
instance.

   I suppose they still keep indices(GIST et al) on the local instance.
  
   (I still think it an interesting exercise to see what could be done
   connecting PostGIS to AWS SimpleDB services.)
  

  

   So thinking out loud here is a possible architecture-
  
   Basic permanent setup
  
   put raster in S3 - this may require some customization of Geoserver,
  
   build a datadir in a PostGIS and backup to S3
  
   create a private ami for Postgresql/PostGIS
  
   create a private ami for the load balancer instance
  
   create a private ami with your service stack for both a small and 
large

   instance for flexibility,
  
  Startup services
  
   start a balancer instance
  
   point your DNS CNAME to this balancer instance
  
   start a PostGis instance (you could have more than one if necessary 
but

   it would be easier to just scale to a larger instance type if the load
   demands it)
  
   have a scripted download from an S3 BU to your PostGIS datadir (I'm
   assuming a relatively static data resource)
  
  Variable services
  
   start service stack instance and connect to PostGIS
  
   update balancer to see new instance - this could be tricky
  
   repeat previous  two steps as needed
  
   at night scale back - cron scaling for a known cycle or use a 
controller

   like weoceo to detect and respond to load fluctuation
  

  
   By the way the public AWS ami with the best resources that I have 
found
   is Ubuntu 7.10 Gutsy. The debian

Re: [OSGeo-Discuss] OS Spatial environment 'sizing'

2008-02-18 Thread Paul Ramsey
Bruce,

On 2/18/08, [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote
 - the applications of interest are GeoServer, Deegree, GeoNetwork,
 MapServer, MapGuide and Postgres/PostGIS.

 - the environment may need to scale relatively quickly.

 - it will be required to serve in the vicinty of 5 to 10 TB of data
 initially (WMS, WFS, WCS).
 - Of the above OS Spatial products, which ones could co-exist on the same
 server (excluding Postgres/PostGIS)?

Putting the Java applications into the same application server would
save a fair amount of memory. Running Java applications takes a
surprising amount of memory, so having them share a runtime would add
efficiency.

I think the best thing folks could do to make a corporate open source
spatial strategy work would be to give folks a means of easily
creating apps and moving them through the devel/test/production chain.
 Access to scripting languages with database access (PHP, Python,
whatever) and a standard application packaging standard that allows
folks to deploy from a tag. Basically once an app is done in
development, tag it and push deploy and it's pulled into test
without human hands touching it. Once it's passed test, again, mash a
button and boom it's live on production.

Fun fun fun!

P
___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss