RE: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management
IMO: Thanks for the comments Michael, I was wondering if you'd contribute;-) (Also, note that wavelet does not necessarily imply lossy anymore, as many assume. Story of my life.) Can you point me to any studies to support the claim that JPEG2000 can indeed be indeed non-lossy? I've seen the claims over the years, but nothing to support it (not that I've actively gone looking for the info, as I haven't had the need). Bruce Notice: This email and any attachments may contain information that is personal, confidential, legally privileged and/or copyright.No part of it should be reproduced, adapted or communicated without the prior written consent of the copyright owner. It is the responsibility of the recipient to check for and remove viruses. If you have received this email in error, please notify the sender by return email, delete it from your system and destroy any copies. You are not authorised to use, communicate or rely on the information contained in this email. Please consider the environment before printing this email. ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
Re: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management
, does anyone know of a good open source spatial solution for storing and accessing (multi and hyperspectral) imagery in a database?;-) WMS 1.3 and WCS are showing promise for serving imagery, including multi and hyperspectral data. Bruce Bannerman [EMAIL PROTECTED] wrote on 20/02/2008 10:09:28 AM: Hi Ivan, The most common advice I've seen says to leave raster out of the DB. Of course footprints and meta data could be there, but you would want to point Geoserver coverage to the image/image pyramid url somewhere in the directory hierarchy. Brent has a nice writeup here: http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data In an AWS sense my idea is to Java proxy the Geoserver Coverage Data URL to S3 buckets and park the imagery over on the S3 side to take advantage of stability and replication. Performance, though, might not be as good as a local directory. Maybe a one time cache to a local directory would work better. Note: Amazon doesn't charge for inside AWS data transfers. So in theory: PostGIS holds the footprint geometry + metadata EC2 Geoserver WFS handles footprint queries into an Svg/Xaml client, just stick it on top of something like JPL BMNG. Once a user picks a coverage switch to the Geoserver WMS/WCS service for zooming around in the selected image pyramid S3 buckets contain the tiffs, pyramids ... EC2 Geoserver handles WMS/WCS service EC2 proxy pulls the imagery from the S3 side as needed Sorry I haven't had time to try this so it is just theoretical. Of course you can go traditional and just keep the coverage imagery files on the local instance avoiding the S3 proxy idea. The reason I don't like that idea is the imagery has to be loaded with every instance creation while an S3 approach would need only one copy. randy -Original Message- From: Lucena, Ivan [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 19, 2008 2:59 PM To: [EMAIL PROTECTED]; OSGeo Discussions Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing' Hi Randy, Bruce, That is a nice piece of advise Randy. I am sorry to intrude the conversation but I would like to ask how that heavy raster manipulation would be treated by PostgreSQL/PostGIS, managed or unmanaged? Best regards, Ivan Randy George wrote: Hi Bruce, On the scale relatively quickly front, you should look at Amazon's EC2/S3 services. I've recently worked with it and find it an attractive platform for scaling http://www.cadmaps.com/gisblog The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat If you use the larger instances the cost is higher but it sounds like you plan on some heavy raster services (WMS,WCS) and lots of memory will help. Small EC2 instance provides $0.10/hr: 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform Large EC2 instances provide $0.40/hr: 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform Extra large EC2 instances $0.80/hr: 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform Note: that the instances do not need to be permanent. Some people (WeoGeo) have been using a couple of failover small instances and then starting new large instances for specific requirements. The idea is to start and stop instances as required rather than having ongoing infrastructure costs. It only takes a minute or so to start an ec2 instance. If you are running a corporate service there may be parts of the day with very little use so you just schedule your heavy duty instances for peak times. If you can connect your raster to S3 buckets rather than instance storage you have built in replicated backup. I know that Java JAI can easily eat up memory and is core to Geoserver WMS/WCS so you probably want to look at large memory footprint for any platform with lots of raster service. I'm partial to Geoserver because of its Java foundation. I think I would try to keep the Apache2 mod_jk Tomcat Geoserver on a separate server instance from PostGIS. This might avoid problems for instance startup since your database would need to be loaded separately. The instance ami resides in a 10G partition the balance of data will probably reside on a /mnt partition separate from ec2-run-instances. You may be able to avoid datadir problems by adding something like Elastra to the mix. Elastra beta is a wrapper for PostgreSql that puts the datadir on S3 rather
RE: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management
Interesting thread... A couple points from the sidelines: My company sells a store-your-images-in-a-database product, for storing JPEG 2000 and MrSID imagery; there are indeed people who see value in using a DB to manage their raster assets. Our product is not open source, but when using it with JPEG 2000 images it *is* designed around the appropriate JP2 standards for storing the data. Very loosely speaking, it uses JP2's internal tiling scheme and each such tile is stored as a blob. Each band can be stored separately, for the sort of workflows you describe. (Also, note that wavelet does not necessarily imply lossy anymore, as many assume. Story of my life.) The whole how-do-I-do-an-image-processing-workflows-across-a-chain-of-servers thing keeps me up at night, esp. the points you bring up below (eek, tile boundaries!). I think WPS is moving slower for a few reasons: OGC specs proceed at a deliberate pace, there are (relatively) few people involved in WPS, and -- most importantly to me -- the workflows are still not well understood enough to have a critical mass of people pushing for a baseline functionality set. -mpg -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Randy George Sent: Wednesday, February 20, 2008 7:38 AM To: 'OSGeo Discussions' Subject: RE: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management Hi Ivan and Bruce, Interesting, other than using JAI a bit on Space Imaging data (this was awhile back) I have been mostly using vectors. I am curious to know what advantage an arcSDE/Oracle stack would provide on image storage. I had understood imagery was simply stored as large blob fields and streamed in and out of the DB where it is processed/viewed etc. The original state I had understood was unchanged (lossy, wavelet, pk or otherwise happening outside the DB), just residing in the DB directory rather than the disk hierarchy. Other than possible table corruption issues I imagined that the overhead for streaming a blob into an image object was the only real concern on DB storage. But I'm getting the idea that something a bit more is going on. Does the image actually get retiled (celled) and then stored in multiple fields? Is a multispectral broken into bands first before storing in separate fields? CHIP sounds more like an additional database function to optimize chipping inside a DBTable so that an entire image doesn't have to be read just to grab a small viewbox. Does arcSDE add similar functions to the base DB or does it just grab out an image and chip, threshold, convolute, histogram, etc after the fact? I'm just curious since I've been fascinated with the prospects of hyperspectral imagery. From an AWS perspective very large imagery would need some type of tiling since there is a 5Gb limit on S3 objects. Larger objects are typically tar gzipped and split before storage. It is hard to imagine a tiling scheme that large anyway. For example Google's Digital Globe tiling pyramid uses miniscule tiles at 256x256 compressed to approx 18kb/tile http://kh.google.com/kh?v=3t=trtsqtqsqqqt http://www.cadmaps.com/gisblog/?p=7 From a web perspective analysis could proceed along a highly tiled approach. So the original 70Gb image becomes a tiled pyramid with the browser view changing position inside the image pyramid. Small patches flow in and out of the view with each zoom and pan. Analysis, WPS, adds some complexity since things like convolution algorithms need to be rewritten to take into account tile boundaries. Or, alternatively the viewbox is re-mosaiced before running a server side convolution that is subsequently streamed back to the browser view, not extremely fast. Hyper-spectral bands would reside in separate tile pyramids so that Boolean layer operations could proceed server side for viewing at the browser. Analysis really can't take advantage of predigested read only schemes like Google's since the whole point is to create new images from combinations of image bands. Consequently WPS seems to be moving slower than WMS, WCS, WFS Thanks Randy [snip] ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
RE: [OSGeo-Discuss] OS Spatial environment 'sizing'
Hi Bruce, On the scale relatively quickly front, you should look at Amazon's EC2/S3 services. I've recently worked with it and find it an attractive platform for scaling http://www.cadmaps.com/gisblog The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat If you use the larger instances the cost is higher but it sounds like you plan on some heavy raster services (WMS,WCS) and lots of memory will help. Small EC2 instance provides $0.10/hr: 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform Large EC2 instances provide $0.40/hr: 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform Extra large EC2 instances $0.80/hr: 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform Note: that the instances do not need to be permanent. Some people (WeoGeo) have been using a couple of failover small instances and then starting new large instances for specific requirements. The idea is to start and stop instances as required rather than having ongoing infrastructure costs. It only takes a minute or so to start an ec2 instance. If you are running a corporate service there may be parts of the day with very little use so you just schedule your heavy duty instances for peak times. If you can connect your raster to S3 buckets rather than instance storage you have built in replicated backup. I know that Java JAI can easily eat up memory and is core to Geoserver WMS/WCS so you probably want to look at large memory footprint for any platform with lots of raster service. I'm partial to Geoserver because of its Java foundation. I think I would try to keep the Apache2 mod_jk Tomcat Geoserver on a separate server instance from PostGIS. This might avoid problems for instance startup since your database would need to be loaded separately. The instance ami resides in a 10G partition the balance of data will probably reside on a /mnt partition separate from ec2-run-instances. You may be able to avoid datadir problems by adding something like Elastra to the mix. Elastra beta is a wrapper for PostgreSql that puts the datadir on S3 rather than local to an instance. I suppose they still keep indices(GIST et al) on the local instance. (I still think it an interesting exercise to see what could be done connecting PostGIS to AWS SimpleDB services.) So thinking out loud here is a possible architecture- Basic permanent setup put raster in S3 - this may require some customization of Geoserver, build a datadir in a PostGIS and backup to S3 create a private ami for Postgresql/PostGIS create a private ami for the load balancer instance create a private ami with your service stack for both a small and large instance for flexibility, Startup services start a balancer instance point your DNS CNAME to this balancer instance start a PostGis instance (you could have more than one if necessary but it would be easier to just scale to a larger instance type if the load demands it) have a scripted download from an S3 BU to your PostGIS datadir (I'm assuming a relatively static data resource) Variable services start service stack instance and connect to PostGIS update balancer to see new instance - this could be tricky repeat previous two steps as needed at night scale back - cron scaling for a known cycle or use a controller like weoceo to detect and respond to load fluctuation By the way the public AWS ami with the best resources that I have found is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier to use and the resources are plentiful. I've been toying with using an AWS stack adapted for serving some larger Postgis vector sets such as fully connected census demographic data and block polygons here in US. The idea would be to populate the data directly from the census SF* and TIGER with a background Java bot. There are some potentially novel 3D viewing approaches possible with xaml. Anyway lots of fun to have access to virtual systems like this. As you can see I'm excited anyway. randy From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, February 18, 2008 6:35 PM To: OSGeo Discussions Subject: [OSGeo-Discuss] OS Spatial environment 'sizing' IMO: Hello everyone, I'm trying to get a feel for server 'sizing' for a **hypothetical** Corporate environment to support OS Spatial apps. Assume that: - this is a dedicated environment to allow the use of OS Spatial applications to serve Corporate OGC Services. - the applications of interest are GeoServer, Deegree, GeoNetwork, MapServer, MapGuide and Postgres/PostGIS. - the environment may need to scale relatively quickly.
Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
Randy, what an informative email. It is almost a Howto for OSGeo hardware and performance tuning. I'm not aware of anyone who has written something similar (although I admit I have not looked). I'd love to see it incorporated into an easily referenced resource - maybe a chapter in http://wiki.osgeo.org/index.php/Educational_Content_Inventory Also, a link from http://wiki.osgeo.org/index.php/Case_Studies . What do you think? Randy George wrote: Hi Bruce, On the “scale relatively quickly” front, you should look at Amazon’s EC2/S3 services. I’ve recently worked with it and find it an attractive platform for scaling http://www.cadmaps.com/gisblog The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat If you use the larger instances the cost is higher but it sounds like you plan on some heavy raster services (WMS,WCS) and lots of memory will help. Small EC2 instance provides $0.10/hr: 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform Large EC2 instances provide $0.40/hr: 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform Extra large EC2 instances $0.80/hr: 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform Note: that the instances do not need to be permanent. Some people (WeoGeo) have been using a couple of failover small instances and then starting new large instances for specific requirements. The idea is to start and stop instances as required rather than having ongoing infrastructure costs. It only takes a minute or so to start an ec2 instance. If you are running a corporate service there may be parts of the day with very little use so you just schedule your heavy duty instances for peak times. If you can connect your raster to S3 buckets rather than instance storage you have built in replicated backup. I know that Java JAI can easily eat up memory and is core to Geoserver WMS/WCS so you probably want to look at large memory footprint for any platform with lots of raster service. I’m partial to Geoserver because of its Java foundation. I think I would try to keep the Apache2 mod_jk Tomcat Geoserver on a separate server instance from PostGIS. This might avoid problems for instance startup since your database would need to be loaded separately. The instance ami resides in a 10G partition the balance of data will probably reside on a /mnt partition separate from ec2-run-instances. You may be able to avoid datadir problems by adding something like Elastra to the mix. Elastra beta is a wrapper for PostgreSql that puts the datadir on S3 rather than local to an instance. I suppose they still keep indices(GIST et al) on the local instance. (I still think it an interesting exercise to see what could be done connecting PostGIS to AWS SimpleDB services.) So thinking out loud here is a possible architecture– Basic permanent setup put raster in S3 – this may require some customization of Geoserver, build a datadir in a PostGIS and backup to S3 create a private ami for Postgresql/PostGIS create a private ami for the load balancer instance create a private ami with your service stack for both a small and large instance for flexibility, Startup services start a balancer instance point your DNS CNAME to this balancer instance start a PostGis instance (you could have more than one if necessary but it would be easier to just scale to a larger instance type if the load demands it) have a scripted download from an S3 BU to your PostGIS datadir (I’m assuming a relatively static data resource) Variable services start service stack instance and connect to PostGIS update balancer to see new instance – this could be tricky repeat previous two steps as needed at night scale back – cron scaling for a known cycle or use a controller like weoceo to detect and respond to load fluctuation By the way the public AWS ami with the best resources that I have found is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier to use and the resources are plentiful. I’ve been toying with using an AWS stack adapted for serving some larger Postgis vector sets such as fully connected census demographic data and block polygons here in US. The idea would be to populate the data directly from the census SF* and TIGER with a background Java bot. There are some potentially novel 3D viewing approaches possible with xaml. Anyway lots of fun to have access to virtual systems like this. As you can see I’m excited anyway. randy *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] *On Behalf Of [EMAIL PROTECTED] *Sent:* Monday, February 18, 2008 6:35 PM *To:* OSGeo Discussions *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing' IMO:
Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
Hi Randy, Bruce, That is a nice piece of advise Randy. I am sorry to intrude the conversation but I would like to ask how that heavy raster manipulation would be treated by PostgreSQL/PostGIS, managed or unmanaged? Best regards, Ivan Randy George wrote: Hi Bruce, On the “scale relatively quickly” front, you should look at Amazon’s EC2/S3 services. I’ve recently worked with it and find it an attractive platform for scaling http://www.cadmaps.com/gisblog The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat If you use the larger instances the cost is higher but it sounds like you plan on some heavy raster services (WMS,WCS) and lots of memory will help. Small EC2 instance provides $0.10/hr: 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform Large EC2 instances provide $0.40/hr: 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform Extra large EC2 instances $0.80/hr: 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform Note: that the instances do not need to be permanent. Some people (WeoGeo) have been using a couple of failover small instances and then starting new large instances for specific requirements. The idea is to start and stop instances as required rather than having ongoing infrastructure costs. It only takes a minute or so to start an ec2 instance. If you are running a corporate service there may be parts of the day with very little use so you just schedule your heavy duty instances for peak times. If you can connect your raster to S3 buckets rather than instance storage you have built in replicated backup. I know that Java JAI can easily eat up memory and is core to Geoserver WMS/WCS so you probably want to look at large memory footprint for any platform with lots of raster service. I’m partial to Geoserver because of its Java foundation. I think I would try to keep the Apache2 mod_jk Tomcat Geoserver on a separate server instance from PostGIS. This might avoid problems for instance startup since your database would need to be loaded separately. The instance ami resides in a 10G partition the balance of data will probably reside on a /mnt partition separate from ec2-run-instances. You may be able to avoid datadir problems by adding something like Elastra to the mix. Elastra beta is a wrapper for PostgreSql that puts the datadir on S3 rather than local to an instance. I suppose they still keep indices(GIST et al) on the local instance. (I still think it an interesting exercise to see what could be done connecting PostGIS to AWS SimpleDB services.) So thinking out loud here is a possible architecture– Basic permanent setup put raster in S3 – this may require some customization of Geoserver, build a datadir in a PostGIS and backup to S3 create a private ami for Postgresql/PostGIS create a private ami for the load balancer instance create a private ami with your service stack for both a small and large instance for flexibility, Startup services start a balancer instance point your DNS CNAME to this balancer instance start a PostGis instance (you could have more than one if necessary but it would be easier to just scale to a larger instance type if the load demands it) have a scripted download from an S3 BU to your PostGIS datadir (I’m assuming a relatively static data resource) Variable services start service stack instance and connect to PostGIS update balancer to see new instance – this could be tricky repeat previous two steps as needed at night scale back – cron scaling for a known cycle or use a controller like weoceo to detect and respond to load fluctuation By the way the public AWS ami with the best resources that I have found is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier to use and the resources are plentiful. I’ve been toying with using an AWS stack adapted for serving some larger Postgis vector sets such as fully connected census demographic data and block polygons here in US. The idea would be to populate the data directly from the census SF* and TIGER with a background Java bot. There are some potentially novel 3D viewing approaches possible with xaml. Anyway lots of fun to have access to virtual systems like this. As you can see I’m excited anyway. randy *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] *On Behalf Of [EMAIL PROTECTED] *Sent:* Monday, February 18, 2008 6:35 PM *To:* OSGeo Discussions *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing' IMO: Hello everyone, I'm trying to get a feel for server 'sizing' for a **hypothetical** Corporate
RE: [OSGeo-Discuss] OS Spatial environment 'sizing'
Hi Ivan, The most common advice I've seen says to leave raster out of the DB. Of course footprints and meta data could be there, but you would want to point Geoserver coverage to the image/image pyramid url somewhere in the directory hierarchy. Brent has a nice writeup here: http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data In an AWS sense my idea is to Java proxy the Geoserver Coverage Data URL to S3 buckets and park the imagery over on the S3 side to take advantage of stability and replication. Performance, though, might not be as good as a local directory. Maybe a one time cache to a local directory would work better. Note: Amazon doesn't charge for inside AWS data transfers. So in theory: PostGIS holds the footprint geometry + metadata EC2 Geoserver WFS handles footprint queries into an Svg/Xaml client, just stick it on top of something like JPL BMNG. Once a user picks a coverage switch to the Geoserver WMS/WCS service for zooming around in the selected image pyramid S3 buckets contain the tiffs, pyramids ... EC2 Geoserver handles WMS/WCS service EC2 proxy pulls the imagery from the S3 side as needed Sorry I haven't had time to try this so it is just theoretical. Of course you can go traditional and just keep the coverage imagery files on the local instance avoiding the S3 proxy idea. The reason I don't like that idea is the imagery has to be loaded with every instance creation while an S3 approach would need only one copy. randy -Original Message- From: Lucena, Ivan [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 19, 2008 2:59 PM To: [EMAIL PROTECTED]; OSGeo Discussions Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing' Hi Randy, Bruce, That is a nice piece of advise Randy. I am sorry to intrude the conversation but I would like to ask how that heavy raster manipulation would be treated by PostgreSQL/PostGIS, managed or unmanaged? Best regards, Ivan Randy George wrote: Hi Bruce, On the scale relatively quickly front, you should look at Amazon's EC2/S3 services. I've recently worked with it and find it an attractive platform for scaling http://www.cadmaps.com/gisblog The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat If you use the larger instances the cost is higher but it sounds like you plan on some heavy raster services (WMS,WCS) and lots of memory will help. Small EC2 instance provides $0.10/hr: 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform Large EC2 instances provide $0.40/hr: 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform Extra large EC2 instances $0.80/hr: 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform Note: that the instances do not need to be permanent. Some people (WeoGeo) have been using a couple of failover small instances and then starting new large instances for specific requirements. The idea is to start and stop instances as required rather than having ongoing infrastructure costs. It only takes a minute or so to start an ec2 instance. If you are running a corporate service there may be parts of the day with very little use so you just schedule your heavy duty instances for peak times. If you can connect your raster to S3 buckets rather than instance storage you have built in replicated backup. I know that Java JAI can easily eat up memory and is core to Geoserver WMS/WCS so you probably want to look at large memory footprint for any platform with lots of raster service. I'm partial to Geoserver because of its Java foundation. I think I would try to keep the Apache2 mod_jk Tomcat Geoserver on a separate server instance from PostGIS. This might avoid problems for instance startup since your database would need to be loaded separately. The instance ami resides in a 10G partition the balance of data will probably reside on a /mnt partition separate from ec2-run-instances. You may be able to avoid datadir problems by adding something like Elastra to the mix. Elastra beta is a wrapper for PostgreSql that puts the datadir on S3 rather than local to an instance. I suppose they still keep indices(GIST et al) on the local instance. (I still think it an interesting exercise to see what could be done connecting PostGIS to AWS SimpleDB services.) So thinking out loud here is a possible architecture- Basic permanent setup put raster in S3 - this may require some customization of Geoserver, build a datadir in a PostGIS and backup to S3 create a private ami for Postgresql/PostGIS create a private ami for the load
RE: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management
IMO: Hi Randy, Thank you for your informative post. It has given me a lot to follow up on and think about. I can see an immediate need that this type of solution could well be used for. I like it. I suspect that in many larger corporate types of environments, it could well be used effectively for 'pilot' and 'pre-production' type tasks. For 'production' type environments, there would be issues of integrating an external service hosting spatial data with internal services hosting corporate aspatial data sources and applications. with regards to storing imagery in a database: rant (and not directed at you) I've also seen a lot of reports suggesting that image management should be file based. My personal preference is to use a database if possible, so that I can take advantage of corporate data management facilities, backups, point in time restores etc. I've managed 70 GB orthophoto mosaics in ArcSDE / Oracle before with minimal problems. I found performance and response times to be comparable with other image web server options on the market that use file based solutions for storing data. Ideally, I'm looking to manage state wide mosaics with a consistant look and feel that can be treated as a single 'layer' by client GIS / Remote Sensing applications (data integrity issues allowing). One potential use is 'best available' data mosaics could undergo regular updates as more imagery is flown or captured. A database makes it easier to manage and deliver such data. My definition of 'imagery' goes beyond aerial photographs and includes multi or hyper-spectral imagery; various geophysics data sources such as aeromagnetics, gravity, radiometrics; radar data etc. Typically this data is required for digital image analysis purposes using a remote sensing application, so the integrity of 'the numbers' that make up the image is very important. Many of today's image based solutions use a (lossy) wavelet compression that can corrupt the integrity of 'the numbers' describing the radiometric data in the image. When we consider the big picture issues facing us today, such as Climate Change, I think that it is important to protect our definitive image libraries from such corruption as they will be invaluable sources of data for future multi-temporal analysis. That said, if the end use is just for a picture, then a wavelet compression is a good option. Just protect the source data for future use. /rant So, does anyone know of a good open source spatial solution for storing and accessing (multi and hyperspectral) imagery in a database?;-) WMS 1.3 and WCS are showing promise for serving imagery, including multi and hyperspectral data. Bruce Bannerman [EMAIL PROTECTED] wrote on 20/02/2008 10:09:28 AM: Hi Ivan, The most common advice I've seen says to leave raster out of the DB. Of course footprints and meta data could be there, but you would want to point Geoserver coverage to the image/image pyramid url somewhere in the directory hierarchy. Brent has a nice writeup here: http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data In an AWS sense my idea is to Java proxy the Geoserver Coverage Data URL to S3 buckets and park the imagery over on the S3 side to take advantage of stability and replication. Performance, though, might not be as good as a local directory. Maybe a one time cache to a local directory would work better. Note: Amazon doesn't charge for inside AWS data transfers. So in theory: PostGIS holds the footprint geometry + metadata EC2 Geoserver WFS handles footprint queries into an Svg/Xaml client, just stick it on top of something like JPL BMNG. Once a user picks a coverage switch to the Geoserver WMS/WCS service for zooming around in the selected image pyramid S3 buckets contain the tiffs, pyramids ... EC2 Geoserver handles WMS/WCS service EC2 proxy pulls the imagery from the S3 side as needed Sorry I haven't had time to try this so it is just theoretical. Of course you can go traditional and just keep the coverage imagery files on the local instance avoiding the S3 proxy idea. The reason I don't like that idea is the imagery has to be loaded with every instance creation while an S3 approach would need only one copy. randy -Original Message- From: Lucena, Ivan [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 19, 2008 2:59 PM To: [EMAIL PROTECTED]; OSGeo Discussions Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing' Hi Randy, Bruce, That is a nice piece of advise Randy. I am sorry to intrude the conversation but I would like to ask how that heavy raster manipulation would be treated by PostgreSQL/PostGIS, managed or unmanaged? Best regards, Ivan Randy George wrote: Hi Bruce, On the scale relatively quickly front, you should look at Amazon's EC2/S3 services. I've recently worked
Re: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management
as needed Sorry I haven't had time to try this so it is just theoretical. Of course you can go traditional and just keep the coverage imagery files on the local instance avoiding the S3 proxy idea. The reason I don't like that idea is the imagery has to be loaded with every instance creation while an S3 approach would need only one copy. randy -Original Message- From: Lucena, Ivan [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 19, 2008 2:59 PM To: [EMAIL PROTECTED]; OSGeo Discussions Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing' Hi Randy, Bruce, That is a nice piece of advise Randy. I am sorry to intrude the conversation but I would like to ask how that heavy raster manipulation would be treated by PostgreSQL/PostGIS, managed or unmanaged? Best regards, Ivan Randy George wrote: Hi Bruce, On the scale relatively quickly front, you should look at Amazon's EC2/S3 services. I've recently worked with it and find it an attractive platform for scaling http://www.cadmaps.com/gisblog The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat If you use the larger instances the cost is higher but it sounds like you plan on some heavy raster services (WMS,WCS) and lots of memory will help. Small EC2 instance provides $0.10/hr: 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform Large EC2 instances provide $0.40/hr: 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform Extra large EC2 instances $0.80/hr: 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform Note: that the instances do not need to be permanent. Some people (WeoGeo) have been using a couple of failover small instances and then starting new large instances for specific requirements. The idea is to start and stop instances as required rather than having ongoing infrastructure costs. It only takes a minute or so to start an ec2 instance. If you are running a corporate service there may be parts of the day with very little use so you just schedule your heavy duty instances for peak times. If you can connect your raster to S3 buckets rather than instance storage you have built in replicated backup. I know that Java JAI can easily eat up memory and is core to Geoserver WMS/WCS so you probably want to look at large memory footprint for any platform with lots of raster service. I'm partial to Geoserver because of its Java foundation. I think I would try to keep the Apache2 mod_jk Tomcat Geoserver on a separate server instance from PostGIS. This might avoid problems for instance startup since your database would need to be loaded separately. The instance ami resides in a 10G partition the balance of data will probably reside on a /mnt partition separate from ec2-run-instances. You may be able to avoid datadir problems by adding something like Elastra to the mix. Elastra beta is a wrapper for PostgreSql that puts the datadir on S3 rather than local to an instance. I suppose they still keep indices(GIST et al) on the local instance. (I still think it an interesting exercise to see what could be done connecting PostGIS to AWS SimpleDB services.) So thinking out loud here is a possible architecture- Basic permanent setup put raster in S3 - this may require some customization of Geoserver, build a datadir in a PostGIS and backup to S3 create a private ami for Postgresql/PostGIS create a private ami for the load balancer instance create a private ami with your service stack for both a small and large instance for flexibility, Startup services start a balancer instance point your DNS CNAME to this balancer instance start a PostGis instance (you could have more than one if necessary but it would be easier to just scale to a larger instance type if the load demands it) have a scripted download from an S3 BU to your PostGIS datadir (I'm assuming a relatively static data resource) Variable services start service stack instance and connect to PostGIS update balancer to see new instance - this could be tricky repeat previous two steps as needed at night scale back - cron scaling for a known cycle or use a controller like weoceo to detect and respond to load fluctuation By the way the public AWS ami with the best resources that I have found is Ubuntu 7.10 Gutsy. The debian
Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
Bruce, On 2/18/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote - the applications of interest are GeoServer, Deegree, GeoNetwork, MapServer, MapGuide and Postgres/PostGIS. - the environment may need to scale relatively quickly. - it will be required to serve in the vicinty of 5 to 10 TB of data initially (WMS, WFS, WCS). - Of the above OS Spatial products, which ones could co-exist on the same server (excluding Postgres/PostGIS)? Putting the Java applications into the same application server would save a fair amount of memory. Running Java applications takes a surprising amount of memory, so having them share a runtime would add efficiency. I think the best thing folks could do to make a corporate open source spatial strategy work would be to give folks a means of easily creating apps and moving them through the devel/test/production chain. Access to scripting languages with database access (PHP, Python, whatever) and a standard application packaging standard that allows folks to deploy from a tag. Basically once an app is done in development, tag it and push deploy and it's pulled into test without human hands touching it. Once it's passed test, again, mash a button and boom it's live on production. Fun fun fun! P ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss