Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread Sufficool, Stanley
Option #1: Hardware RAID
Pros: Fast, application agnostic performance and failover. Can resize on some 
controllers and use a file system capable of dynamic resizing (ZFS, XFS, NTFS)
Cons: Expensive controller hardware, possibly complicated file system resizing 
on cheaper controllers. Filesystem may be offline during resize (unless ZFS).

Option #2: Software RAID
Pros: Linux and Windows has built in resizing for these volumes. Protected data 
when using RAID 10, 5, 1.
Cons: Can steal processor time when using RAID 5 & 6.

Option #3: Dedicated iSCSI, AoE, FC SAN
Pros: Not stealing processor from application server. Can use software RAID on 
SAN controller without application server performance penalty. Can be cheap if 
using existing server and LAN network/fabric.
Cons: You're managing another server, possibly storage fabric and switches. Can 
get expensive.

Option #4: Cheap onboard SATA JBOD RAID with resizable filesystem.
Pros: Not much if any processor overhead. Can hot-add disks on most SATA boards.
Cons: One disk goes and the whole dataset is shot.

Option #5: Table inheritance or View with INSTEAD OF triggers on insert, delete 
and update.
Pros: Can create your partitioned table on any disk.
Cons: Complicated setup and many potential pitfalls. May not be able to take 
full advantage of PostgreSQL indexing in some cases.

Option #6: Compressed file system
Pros: Cheap space savings. Can INCREASE read times where processor is faster 
than the disk subsystem.
Cons: May steal processor when disk subsystem is sufficiently fast (ie RAID).


My suggestion: Use a filesystem that allows online resizing (EXT3+, XFS, etc..) 
on a Hardware/Software RAID that allows hot drive addition. This will get you 
your 99.9% uptime.



>-Original Message-
>From: postgis-users-boun...@postgis.refractions.net
>[mailto:postgis-users-boun...@postgis.refractions.net] On
>Behalf Of P Kishor
>Sent: Tuesday, April 06, 2010 7:34 AM
>To: PostGIS Users Discussion
>Subject: [postgis-users] splitting a PostGIS db across multiple disks
>
>
>I asked this question yesterday, and received a very helpful
>pointer from Ben Madin re. TABLESPACES. As noted in my reply
>in that thread, I am also investigating the possibility of
>splitting a single table across multiple disks.
>
>However, I am going to post this question in a different way
>in this new thread.
>
>Suppose I have a table FOO0 that stores info about every state
>in the union. I know that some of these states will have mongo
>number of rows, but I don't have to build all the states
>immediately. So, I start with a few states' worth data,
>putting it in the default /usr/local/pgsql/data location.
>
>Then I start outgrowing that disk, and need to add another
>state, so I add another disk, create a new tablespace, and
>create a new table called FOO1 in this new tablespace. Then I
>can store the new states in FOO1. As long as I break up my
>table into FOO0, FOO1, FOO2, and so on, I can store each FOOn
>in a new tablespace. And, as long as I ensure that each FOOn
>table contains a geographically consistent spatial extent, I
>can build logic in my application to query the correct table.
>
>So, lets say 0 lon to -10 lon data are stored in FOO0, and -10
>lon to -20 lon in FOO1, then if the user requests data for -5
>lon to -15 lon, I will have to query both FOO0 and FOO1.
>
>More work for me, but it is doable, no? Any insights on how to
>handle something like this?
>
>A corollary question -- are their any speed advantages to
>actually creating multiple PostGIS instances, perhaps even
>splitting them across multiple machines? Of course, it is
>going to be a pain in the ass for me to maintain more than one
>instance of PostGres/PostGIS, so I am not thrilled at that
>possibility. I'd rather have a single instance just be
>managing data across multiple locations as required.
>
>
>--
>Puneet Kishor http://www.punkish.org
>Carbon Model http://carbonmodel.org
>Charter Member, Open Source Geospatial Foundation
>http://www.osgeo.org Science Commons Fellow,
>http://sciencecommons.org/about/whoweare/kishor
>Nelson Institute, UW-Madison http://www.nelson.wisc.edu
>---
>Assertions are politics; backing up assertions with evidence
>is science
>===
>___
>postgis-users mailing list postgis-users@postgis.refractions.net
>http://postgis.refractions.net/mailman/listinfo/postgis-users
>
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread Brian Modra
On 06/04/2010, P Kishor  wrote:
> On Tue, Apr 6, 2010 at 10:22 AM, Chris Hermansen
>  wrote:
>> What about using RAID?  No data reorganization necessary.
>>
>
> I am already using a RAID, and conceptually, a RAID is the same as a
> single disk. My issue arises when I run out of space on RAID as well.
> While I could extend the RAID, and theoretically never have any space
> limitation, I would still have to use TABLESPACES, just so I can store
> part of the db in another location on the expanded RAID space, and
> also have to utilize table partitioning, because, at the db level, I
> am dealing with a single table that is growing beyond what is
> available on the RAID.
>
> I am assuming that adding more disks to a RAID is not a simple task,
> because the RAID set has to be migrated to incorporate the newly added
> disk... I tried a RAID migration yesterday, and it takes a long time.
> So, tablespaces are essential. Table partitioning would allow me to
> deal with a single conceptual table, so my application would not be
> riddled with junky logic (if querying this geog. area, query that
> table, if querying that geog. area, query that other table and such
> nonsense).
>
> Wrt the dataset itself... it is all point data. My current estimate is
> that the raw dataset is going to be about 125 billion rows in a table.
> In terms of disk space, it will be about 4 TB raw, that is, before it
> goes into the db. Indexes and other db overhead might 1.5x or 2x it,
> so, say, max 8 TB.

This may cause a lot of discussion... but in my opinion (though I
posted a few minutes ago answering a comment about RAID) ... my
opinion is that if you can design your own software, then design out
RAID.

RAID is required because software needs lots of space, and disks are
not big enough. Its a compromise. Then add more disks, you increase
the probability of failure, so you have to build in redundancy... you
need expensive hardware RAID controllers, you need 3U or bigger
servers... cost goes up...

If you can split your dataset up logically, across multiple disks, and
(even better) multiple hosts... then it will be faster, you isolate
the chance of failure to just one section of your service (rather than
the whole service), and you can use less expensive servers, or get a
blade cabinet and really squeeze them in to use less server room space
and therefore less rental costs.

But... this assumes that you are at the point where you can design
your own software.
I've been in this position (being able to design my own PostGis -
based service) and this is how I did it, and how I'd do it again
(except if anything, more so).

It makes sense to me to use small servers, (relatively) small disks,
and smaller service applications.

>
>> http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide
>>
>> As the above notes, disk performance (eg spindle speed) is important.
>>
>> Also, to get good performance, you need to understand your application's
>> performance needs.  Do you have a lot of small queries, etc etc.  If you
>> don't understand your application's performance needs, then you may be
>> better off running your application for awhile and measuring its
>> bottlenecks before you decide on a storage strategy.
>>
>> P Kishor wrote:
>>> I asked this question yesterday, and received a very helpful pointer
>>> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
>>> am also investigating the possibility of splitting a single table
>>> across multiple disks.
>>>
>>> However, I am going to post this question in a different way in this new
>>> thread.
>>>
>>> Suppose I have a table FOO0 that stores info about every state in the
>>> union. I know that some of these states will have mongo number of
>>> rows, but I don't have to build all the states immediately. So, I
>>> start with a few states' worth data, putting it in the default
>>> /usr/local/pgsql/data location.
>>>
>>> Then I start outgrowing that disk, and need to add another state, so I
>>> add another disk, create a new tablespace, and create a new table
>>> called FOO1 in this new tablespace. Then I can store the new states in
>>> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
>>> I can store each FOOn in a new tablespace. And, as long as I ensure
>>> that each FOOn table contains a geographically consistent spatial
>>> extent, I can build logic in my application to query the correct
>>> table.
>>>
>>> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
>>> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
>>> I will have to query both FOO0 and FOO1.
>>>
>>> More work for me, but it is doable, no? Any insights on how to handle
>>> something like this?
>>>
>>> A corollary question -- are their any speed advantages to actually
>>> creating multiple PostGIS instances, perhaps even splitting them
>>> across multiple machines? Of course, it is going to be a pain in the
>>> ass for me to maintain more than one inst

Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread Chris Hermansen
A big database, that's for sure.

You said in an earlier mail that you had a 1Tb drive.  one 1Tb drive is
not RAID.

If you think you're going to need 8Tb, then with the current price of
storage - about $100 per quality Tb -  why not just get your storage,
configure as RAID 10 or something like that, and be done with it?  No
adding disks, no time spent reconfiguring every time you add one, etc
etc.  Plus you get redundancy.

If you partition your tables, you need to have a key value of some kind
to organize the partitioning.  A bad choice of keys may lead to a few of
your disks getting more access than others which will give you less
optimal performance, or it may lead to you having stuff pile up on one
disk and eventually exhaust the space there.  Either of which could
require you to re-organize your keys and data, which would be expensive
and time consuming.



P Kishor wrote:
> On Tue, Apr 6, 2010 at 10:22 AM, Chris Hermansen
>  wrote:
>   
>> What about using RAID?  No data reorganization necessary.
>>
>> 
>
> I am already using a RAID, and conceptually, a RAID is the same as a
> single disk. My issue arises when I run out of space on RAID as well.
> While I could extend the RAID, and theoretically never have any space
> limitation, I would still have to use TABLESPACES, just so I can store
> part of the db in another location on the expanded RAID space, and
> also have to utilize table partitioning, because, at the db level, I
> am dealing with a single table that is growing beyond what is
> available on the RAID.
>
> I am assuming that adding more disks to a RAID is not a simple task,
> because the RAID set has to be migrated to incorporate the newly added
> disk... I tried a RAID migration yesterday, and it takes a long time.
> So, tablespaces are essential. Table partitioning would allow me to
> deal with a single conceptual table, so my application would not be
> riddled with junky logic (if querying this geog. area, query that
> table, if querying that geog. area, query that other table and such
> nonsense).
>
> Wrt the dataset itself... it is all point data. My current estimate is
> that the raw dataset is going to be about 125 billion rows in a table.
> In terms of disk space, it will be about 4 TB raw, that is, before it
> goes into the db. Indexes and other db overhead might 1.5x or 2x it,
> so, say, max 8 TB.
>
>
>   
>> http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide
>>
>> As the above notes, disk performance (eg spindle speed) is important.
>>
>> Also, to get good performance, you need to understand your application's
>> performance needs.  Do you have a lot of small queries, etc etc.  If you
>> don't understand your application's performance needs, then you may be
>> better off running your application for awhile and measuring its
>> bottlenecks before you decide on a storage strategy.
>>
>> P Kishor wrote:
>> 
>>> I asked this question yesterday, and received a very helpful pointer
>>> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
>>> am also investigating the possibility of splitting a single table
>>> across multiple disks.
>>>
>>> However, I am going to post this question in a different way in this new 
>>> thread.
>>>
>>> Suppose I have a table FOO0 that stores info about every state in the
>>> union. I know that some of these states will have mongo number of
>>> rows, but I don't have to build all the states immediately. So, I
>>> start with a few states' worth data, putting it in the default
>>> /usr/local/pgsql/data location.
>>>
>>> Then I start outgrowing that disk, and need to add another state, so I
>>> add another disk, create a new tablespace, and create a new table
>>> called FOO1 in this new tablespace. Then I can store the new states in
>>> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
>>> I can store each FOOn in a new tablespace. And, as long as I ensure
>>> that each FOOn table contains a geographically consistent spatial
>>> extent, I can build logic in my application to query the correct
>>> table.
>>>
>>> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
>>> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
>>> I will have to query both FOO0 and FOO1.
>>>
>>> More work for me, but it is doable, no? Any insights on how to handle
>>> something like this?
>>>
>>> A corollary question -- are their any speed advantages to actually
>>> creating multiple PostGIS instances, perhaps even splitting them
>>> across multiple machines? Of course, it is going to be a pain in the
>>> ass for me to maintain more than one instance of PostGres/PostGIS, so
>>> I am not thrilled at that possibility. I'd rather have a single
>>> instance just be managing data across multiple locations as required.
>>>
>>>
>>>
>>>   
>> --
>> Regards,
>>
>> Chris Hermansen·mailto:chris.herman...@timberline.ca
>> tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625
>> Timberli

Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread P Kishor
On Tue, Apr 6, 2010 at 1:12 PM, P Kishor  wrote:
> On Tue, Apr 6, 2010 at 10:22 AM, Chris Hermansen
>  wrote:
>> What about using RAID?  No data reorganization necessary.
>>
>
> I am already using a RAID, and conceptually, a RAID is the same as a
> single disk. My issue arises when I run out of space on RAID as well.


I guess I could render all this discussion moot by going with a Drobo.
The Drobo with its "BeyondRAID" technology allows me to add drives as
time goes by, as my space need grows, as I acquire more funds to
purchase more hardware. To the installed PostGres/PostGIS instance, it
just appears as a logical drive that magically gets bigger when
needed. No RAID set migration, no table partitioning, no tablespaces
(unless absolutely required for performance purpose).

Anyone have any experience with speed when running a database on a
Drobo? They are selling a two-pack of Drobo with 32 TB total for under
$6000.

http://www.drobostore.com:80/servlet/ControllerServlet?Action=DisplayPage&Env=BASE&Locale=en_US&SiteID=drobo&id=ShoppingCartPage


> While I could extend the RAID, and theoretically never have any space
> limitation, I would still have to use TABLESPACES, just so I can store
> part of the db in another location on the expanded RAID space, and
> also have to utilize table partitioning, because, at the db level, I
> am dealing with a single table that is growing beyond what is
> available on the RAID.
>
> I am assuming that adding more disks to a RAID is not a simple task,
> because the RAID set has to be migrated to incorporate the newly added
> disk... I tried a RAID migration yesterday, and it takes a long time.
> So, tablespaces are essential. Table partitioning would allow me to
> deal with a single conceptual table, so my application would not be
> riddled with junky logic (if querying this geog. area, query that
> table, if querying that geog. area, query that other table and such
> nonsense).
>
> Wrt the dataset itself... it is all point data. My current estimate is
> that the raw dataset is going to be about 125 billion rows in a table.
> In terms of disk space, it will be about 4 TB raw, that is, before it
> goes into the db. Indexes and other db overhead might 1.5x or 2x it,
> so, say, max 8 TB.
>
>
>> http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide
>>
>> As the above notes, disk performance (eg spindle speed) is important.
>>
>> Also, to get good performance, you need to understand your application's
>> performance needs.  Do you have a lot of small queries, etc etc.  If you
>> don't understand your application's performance needs, then you may be
>> better off running your application for awhile and measuring its
>> bottlenecks before you decide on a storage strategy.
>>
>> P Kishor wrote:
>>> I asked this question yesterday, and received a very helpful pointer
>>> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
>>> am also investigating the possibility of splitting a single table
>>> across multiple disks.
>>>
>>> However, I am going to post this question in a different way in this new 
>>> thread.
>>>
>>> Suppose I have a table FOO0 that stores info about every state in the
>>> union. I know that some of these states will have mongo number of
>>> rows, but I don't have to build all the states immediately. So, I
>>> start with a few states' worth data, putting it in the default
>>> /usr/local/pgsql/data location.
>>>
>>> Then I start outgrowing that disk, and need to add another state, so I
>>> add another disk, create a new tablespace, and create a new table
>>> called FOO1 in this new tablespace. Then I can store the new states in
>>> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
>>> I can store each FOOn in a new tablespace. And, as long as I ensure
>>> that each FOOn table contains a geographically consistent spatial
>>> extent, I can build logic in my application to query the correct
>>> table.
>>>
>>> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
>>> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
>>> I will have to query both FOO0 and FOO1.
>>>
>>> More work for me, but it is doable, no? Any insights on how to handle
>>> something like this?
>>>
>>> A corollary question -- are their any speed advantages to actually
>>> creating multiple PostGIS instances, perhaps even splitting them
>>> across multiple machines? Of course, it is going to be a pain in the
>>> ass for me to maintain more than one instance of PostGres/PostGIS, so
>>> I am not thrilled at that possibility. I'd rather have a single
>>> instance just be managing data across multiple locations as required.
>>>
>>>
>>>
>>
>> --
>> Regards,
>>
>> Chris Hermansen    ·    mailto:chris.herman...@timberline.ca
>> tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625
>> Timberline Natural Resource Group · http://www.timberline.ca
>> 401 · 958 West 8th Avenue  · Vancouver BC · Canada · V5Z 1E5
>>
>> __

Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread P Kishor
On Tue, Apr 6, 2010 at 10:22 AM, Chris Hermansen
 wrote:
> What about using RAID?  No data reorganization necessary.
>

I am already using a RAID, and conceptually, a RAID is the same as a
single disk. My issue arises when I run out of space on RAID as well.
While I could extend the RAID, and theoretically never have any space
limitation, I would still have to use TABLESPACES, just so I can store
part of the db in another location on the expanded RAID space, and
also have to utilize table partitioning, because, at the db level, I
am dealing with a single table that is growing beyond what is
available on the RAID.

I am assuming that adding more disks to a RAID is not a simple task,
because the RAID set has to be migrated to incorporate the newly added
disk... I tried a RAID migration yesterday, and it takes a long time.
So, tablespaces are essential. Table partitioning would allow me to
deal with a single conceptual table, so my application would not be
riddled with junky logic (if querying this geog. area, query that
table, if querying that geog. area, query that other table and such
nonsense).

Wrt the dataset itself... it is all point data. My current estimate is
that the raw dataset is going to be about 125 billion rows in a table.
In terms of disk space, it will be about 4 TB raw, that is, before it
goes into the db. Indexes and other db overhead might 1.5x or 2x it,
so, say, max 8 TB.


> http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide
>
> As the above notes, disk performance (eg spindle speed) is important.
>
> Also, to get good performance, you need to understand your application's
> performance needs.  Do you have a lot of small queries, etc etc.  If you
> don't understand your application's performance needs, then you may be
> better off running your application for awhile and measuring its
> bottlenecks before you decide on a storage strategy.
>
> P Kishor wrote:
>> I asked this question yesterday, and received a very helpful pointer
>> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
>> am also investigating the possibility of splitting a single table
>> across multiple disks.
>>
>> However, I am going to post this question in a different way in this new 
>> thread.
>>
>> Suppose I have a table FOO0 that stores info about every state in the
>> union. I know that some of these states will have mongo number of
>> rows, but I don't have to build all the states immediately. So, I
>> start with a few states' worth data, putting it in the default
>> /usr/local/pgsql/data location.
>>
>> Then I start outgrowing that disk, and need to add another state, so I
>> add another disk, create a new tablespace, and create a new table
>> called FOO1 in this new tablespace. Then I can store the new states in
>> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
>> I can store each FOOn in a new tablespace. And, as long as I ensure
>> that each FOOn table contains a geographically consistent spatial
>> extent, I can build logic in my application to query the correct
>> table.
>>
>> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
>> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
>> I will have to query both FOO0 and FOO1.
>>
>> More work for me, but it is doable, no? Any insights on how to handle
>> something like this?
>>
>> A corollary question -- are their any speed advantages to actually
>> creating multiple PostGIS instances, perhaps even splitting them
>> across multiple machines? Of course, it is going to be a pain in the
>> ass for me to maintain more than one instance of PostGres/PostGIS, so
>> I am not thrilled at that possibility. I'd rather have a single
>> instance just be managing data across multiple locations as required.
>>
>>
>>
>
> --
> Regards,
>
> Chris Hermansen    ·    mailto:chris.herman...@timberline.ca
> tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625
> Timberline Natural Resource Group · http://www.timberline.ca
> 401 · 958 West 8th Avenue  · Vancouver BC · Canada · V5Z 1E5
>
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>



-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
---
Assertions are politics; backing up assertions with evidence is science
===
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread Brian Modra
On 06/04/2010, Chris Hermansen  wrote:
> What about using RAID?  No data reorganization necessary.
>
> http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide
>
> As the above notes, disk performance (eg spindle speed) is important.
>
> Also, to get good performance, you need to understand your application's
> performance needs.  Do you have a lot of small queries, etc etc.  If you
> don't understand your application's performance needs, then you may be
> better off running your application for awhile and measuring its
> bottlenecks before you decide on a storage strategy.

very good points.
Regarding RAID, as a general rule of thumb, RAID 1+0 is best for a
database, also called RAID 10.

To configure a good RAID system, make sure you have a multi-processor
mother-board, lots of memory, and an Adaptec RAID controller (hardware
RAID, not software RAID.)

>
> P Kishor wrote:
>> I asked this question yesterday, and received a very helpful pointer
>> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
>> am also investigating the possibility of splitting a single table
>> across multiple disks.
>>
>> However, I am going to post this question in a different way in this new
>> thread.
>>
>> Suppose I have a table FOO0 that stores info about every state in the
>> union. I know that some of these states will have mongo number of
>> rows, but I don't have to build all the states immediately. So, I
>> start with a few states' worth data, putting it in the default
>> /usr/local/pgsql/data location.
>>
>> Then I start outgrowing that disk, and need to add another state, so I
>> add another disk, create a new tablespace, and create a new table
>> called FOO1 in this new tablespace. Then I can store the new states in
>> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
>> I can store each FOOn in a new tablespace. And, as long as I ensure
>> that each FOOn table contains a geographically consistent spatial
>> extent, I can build logic in my application to query the correct
>> table.
>>
>> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
>> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
>> I will have to query both FOO0 and FOO1.
>>
>> More work for me, but it is doable, no? Any insights on how to handle
>> something like this?
>>
>> A corollary question -- are their any speed advantages to actually
>> creating multiple PostGIS instances, perhaps even splitting them
>> across multiple machines? Of course, it is going to be a pain in the
>> ass for me to maintain more than one instance of PostGres/PostGIS, so
>> I am not thrilled at that possibility. I'd rather have a single
>> instance just be managing data across multiple locations as required.
>>
>>
>>
>
> --
> Regards,
>
> Chris Hermansen·mailto:chris.herman...@timberline.ca
> tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625
> Timberline Natural Resource Group · http://www.timberline.ca
> 401 · 958 West 8th Avenue  · Vancouver BC · Canada · V5Z 1E5
>
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>


-- 
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread Ben Madin
Puneet

I'd wonder if a view can do it off multiple tables ? Having said that, you 
might want to check the archives for inheritance. I recall someone ?Leo? 
mentioning something similar when I was talking about keeping an archive table 
without the current data, but being able to look up data from both as a way of 
dealing with Change of Support issues. I think you can have an inherited table 
with no data inheriting from a number of tables. You can stop the query 
searching all tables by having constraints on them - maybe they could reflect 
bounding boxes?


cheers

Ben




On 06/04/2010, at 22:33 , P Kishor wrote:

> I asked this question yesterday, and received a very helpful pointer
> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
> am also investigating the possibility of splitting a single table
> across multiple disks.
> 
> However, I am going to post this question in a different way in this new 
> thread.
> 
> Suppose I have a table FOO0 that stores info about every state in the
> union. I know that some of these states will have mongo number of
> rows, but I don't have to build all the states immediately. So, I
> start with a few states' worth data, putting it in the default
> /usr/local/pgsql/data location.
> 
> Then I start outgrowing that disk, and need to add another state, so I
> add another disk, create a new tablespace, and create a new table
> called FOO1 in this new tablespace. Then I can store the new states in
> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
> I can store each FOOn in a new tablespace. And, as long as I ensure
> that each FOOn table contains a geographically consistent spatial
> extent, I can build logic in my application to query the correct
> table.
> 
> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
> I will have to query both FOO0 and FOO1.
> 
> More work for me, but it is doable, no? Any insights on how to handle
> something like this?
> 
> A corollary question -- are their any speed advantages to actually
> creating multiple PostGIS instances, perhaps even splitting them
> across multiple machines? Of course, it is going to be a pain in the
> ass for me to maintain more than one instance of PostGres/PostGIS, so
> I am not thrilled at that possibility. I'd rather have a single
> instance just be managing data across multiple locations as required.
> 
> 
> -- 
> Puneet Kishor http://www.punkish.org
> Carbon Model http://carbonmodel.org
> Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
> Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
> Nelson Institute, UW-Madison http://www.nelson.wisc.edu
> ---
> Assertions are politics; backing up assertions with evidence is science
> ===
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread Chris Hermansen
What about using RAID?  No data reorganization necessary.

http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide

As the above notes, disk performance (eg spindle speed) is important.

Also, to get good performance, you need to understand your application's
performance needs.  Do you have a lot of small queries, etc etc.  If you
don't understand your application's performance needs, then you may be
better off running your application for awhile and measuring its
bottlenecks before you decide on a storage strategy.

P Kishor wrote:
> I asked this question yesterday, and received a very helpful pointer
> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
> am also investigating the possibility of splitting a single table
> across multiple disks.
>
> However, I am going to post this question in a different way in this new 
> thread.
>
> Suppose I have a table FOO0 that stores info about every state in the
> union. I know that some of these states will have mongo number of
> rows, but I don't have to build all the states immediately. So, I
> start with a few states' worth data, putting it in the default
> /usr/local/pgsql/data location.
>
> Then I start outgrowing that disk, and need to add another state, so I
> add another disk, create a new tablespace, and create a new table
> called FOO1 in this new tablespace. Then I can store the new states in
> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
> I can store each FOOn in a new tablespace. And, as long as I ensure
> that each FOOn table contains a geographically consistent spatial
> extent, I can build logic in my application to query the correct
> table.
>
> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
> I will have to query both FOO0 and FOO1.
>
> More work for me, but it is doable, no? Any insights on how to handle
> something like this?
>
> A corollary question -- are their any speed advantages to actually
> creating multiple PostGIS instances, perhaps even splitting them
> across multiple machines? Of course, it is going to be a pain in the
> ass for me to maintain more than one instance of PostGres/PostGIS, so
> I am not thrilled at that possibility. I'd rather have a single
> instance just be managing data across multiple locations as required.
>
>
>   

-- 
Regards,

Chris Hermansen·mailto:chris.herman...@timberline.ca
tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625
Timberline Natural Resource Group · http://www.timberline.ca
401 · 958 West 8th Avenue  · Vancouver BC · Canada · V5Z 1E5

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread Rick
You can make multiple disks act like one using a method called striping.

Here is the first link I googled.  Apparently it will increase performance
as well...

http://insights.oetiker.ch/linux/raidoptimization/


On Tue, Apr 6, 2010 at 10:58 AM, Brian Modra  wrote:

> On 06/04/2010, P Kishor  wrote:
> > I asked this question yesterday, and received a very helpful pointer
> > from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
> > am also investigating the possibility of splitting a single table
> > across multiple disks.
> >
> > However, I am going to post this question in a different way in this new
> > thread.
> >
> > Suppose I have a table FOO0 that stores info about every state in the
> > union. I know that some of these states will have mongo number of
> > rows, but I don't have to build all the states immediately. So, I
> > start with a few states' worth data, putting it in the default
> > /usr/local/pgsql/data location.
> >
> > Then I start outgrowing that disk, and need to add another state, so I
> > add another disk, create a new tablespace, and create a new table
> > called FOO1 in this new tablespace. Then I can store the new states in
> > FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
> > I can store each FOOn in a new tablespace. And, as long as I ensure
> > that each FOOn table contains a geographically consistent spatial
> > extent, I can build logic in my application to query the correct
> > table.
> >
> > So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
> > -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
> > I will have to query both FOO0 and FOO1.
>
> If you split the data by lat-long extents, then you will need to also
> create duplicates of some of the data... anything that intersects the
> boundary must be duplicated...
>
> I assume you have a table with state boundaries in it?
> You could use this as an "index" to the table name. Then store a state
> per table...
>
> So when you do a spatial query, you first use the extent of the query
> to find which state(s) are withing/intersecting/contain the query
> area. Then you can generate one or more spatial queries to the tables
> containing the states data.
>
> > More work for me, but it is doable, no? Any insights on how to handle
> > something like this?
>
> yes
>
> > A corollary question -- are their any speed advantages to actually
> > creating multiple PostGIS instances, perhaps even splitting them
> > across multiple machines? Of course, it is going to be a pain in the
> > ass for me to maintain more than one instance of PostGres/PostGIS, so
> > I am not thrilled at that possibility. I'd rather have a single
> > instance just be managing data across multiple locations as required.
>
> Using multiple machines means that parallel retrievals are possible.
> It has to make it faster.
> ... but your idea of splitting data geographically probably makes this
> unlikely to happen in the majority of queries.
>
> Rather than split the data geographically, is there some property in
> the tables that can be used to split it up... so that certain rows
> (based on this key) go to different tables?
> This may result in a better performance benefit, because a single
> spatial query will hit multiple databases, returning rows in parallel.
>
> But then again, all this depends on how you gather your data for a
> query... to make what I just described work, you'd have to execute
> multiple queries at once, i.e. multi-threading.
>
> >
> >
> > --
> > Puneet Kishor http://www.punkish.org
> > Carbon Model http://carbonmodel.org
> > Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
> > Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
> > Nelson Institute, UW-Madison http://www.nelson.wisc.edu
> > ---
> > Assertions are politics; backing up assertions with evidence is science
> > ===
> > ___
> > postgis-users mailing list
> > postgis-users@postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
>
>
> --
> Brian Modra   Land line: +27 23 5411 462
> Mobile: +27 79 69 77 082
> 5 Jan Louw Str, Prince Albert, 6930
> Postal: P.O. Box 2, Prince Albert 6930
> South Africa
> http://www.zwartberg.com/
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>



-- 
Cheers!
Rick
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread Brian Modra
On 06/04/2010, P Kishor  wrote:
> I asked this question yesterday, and received a very helpful pointer
> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
> am also investigating the possibility of splitting a single table
> across multiple disks.
>
> However, I am going to post this question in a different way in this new
> thread.
>
> Suppose I have a table FOO0 that stores info about every state in the
> union. I know that some of these states will have mongo number of
> rows, but I don't have to build all the states immediately. So, I
> start with a few states' worth data, putting it in the default
> /usr/local/pgsql/data location.
>
> Then I start outgrowing that disk, and need to add another state, so I
> add another disk, create a new tablespace, and create a new table
> called FOO1 in this new tablespace. Then I can store the new states in
> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
> I can store each FOOn in a new tablespace. And, as long as I ensure
> that each FOOn table contains a geographically consistent spatial
> extent, I can build logic in my application to query the correct
> table.
>
> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
> I will have to query both FOO0 and FOO1.

If you split the data by lat-long extents, then you will need to also
create duplicates of some of the data... anything that intersects the
boundary must be duplicated...

I assume you have a table with state boundaries in it?
You could use this as an "index" to the table name. Then store a state
per table...

So when you do a spatial query, you first use the extent of the query
to find which state(s) are withing/intersecting/contain the query
area. Then you can generate one or more spatial queries to the tables
containing the states data.

> More work for me, but it is doable, no? Any insights on how to handle
> something like this?

yes

> A corollary question -- are their any speed advantages to actually
> creating multiple PostGIS instances, perhaps even splitting them
> across multiple machines? Of course, it is going to be a pain in the
> ass for me to maintain more than one instance of PostGres/PostGIS, so
> I am not thrilled at that possibility. I'd rather have a single
> instance just be managing data across multiple locations as required.

Using multiple machines means that parallel retrievals are possible.
It has to make it faster.
... but your idea of splitting data geographically probably makes this
unlikely to happen in the majority of queries.

Rather than split the data geographically, is there some property in
the tables that can be used to split it up... so that certain rows
(based on this key) go to different tables?
This may result in a better performance benefit, because a single
spatial query will hit multiple databases, returning rows in parallel.

But then again, all this depends on how you gather your data for a
query... to make what I just described work, you'd have to execute
multiple queries at once, i.e. multi-threading.

>
>
> --
> Puneet Kishor http://www.punkish.org
> Carbon Model http://carbonmodel.org
> Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
> Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
> Nelson Institute, UW-Madison http://www.nelson.wisc.edu
> ---
> Assertions are politics; backing up assertions with evidence is science
> ===
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>


-- 
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread Jim Mlodgenski
On Tue, Apr 6, 2010 at 10:33 AM, P Kishor  wrote:
> I asked this question yesterday, and received a very helpful pointer
> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
> am also investigating the possibility of splitting a single table
> across multiple disks.
>
> However, I am going to post this question in a different way in this new 
> thread.
>
> Suppose I have a table FOO0 that stores info about every state in the
> union. I know that some of these states will have mongo number of
> rows, but I don't have to build all the states immediately. So, I
> start with a few states' worth data, putting it in the default
> /usr/local/pgsql/data location.
>
> Then I start outgrowing that disk, and need to add another state, so I
> add another disk, create a new tablespace, and create a new table
> called FOO1 in this new tablespace. Then I can store the new states in
> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
> I can store each FOOn in a new tablespace. And, as long as I ensure
> that each FOOn table contains a geographically consistent spatial
> extent, I can build logic in my application to query the correct
> table.
>
> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
> I will have to query both FOO0 and FOO1.
>
> More work for me, but it is doable, no? Any insights on how to handle
> something like this?

Use table partitioning
http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html

>
> A corollary question -- are their any speed advantages to actually
> creating multiple PostGIS instances, perhaps even splitting them
> across multiple machines? Of course, it is going to be a pain in the
> ass for me to maintain more than one instance of PostGres/PostGIS, so
> I am not thrilled at that possibility. I'd rather have a single
> instance just be managing data across multiple locations as required.
>
>
> --
> Puneet Kishor http://www.punkish.org
> Carbon Model http://carbonmodel.org
> Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
> Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
> Nelson Institute, UW-Madison http://www.nelson.wisc.edu
> ---
> Assertions are politics; backing up assertions with evidence is science
> ===
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>



-- 
--
Jim Mlodgenski
EnterpriseDB (http://www.enterprisedb.com)
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread Raphaël Jacquot
On Tue, 2010-04-06 at 09:33 -0500, P Kishor wrote:

> Suppose I have a table FOO0 that stores info about every state in the
> union. I know that some of these states will have mongo number of
> rows, but I don't have to build all the states immediately. So, I
> start with a few states' worth data, putting it in the default
> /usr/local/pgsql/data location.

you should use some form of hashing function to distribute the stuff
evenly


___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


[postgis-users] splitting a PostGIS db across multiple disks

2010-04-06 Thread P Kishor
I asked this question yesterday, and received a very helpful pointer
from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I
am also investigating the possibility of splitting a single table
across multiple disks.

However, I am going to post this question in a different way in this new thread.

Suppose I have a table FOO0 that stores info about every state in the
union. I know that some of these states will have mongo number of
rows, but I don't have to build all the states immediately. So, I
start with a few states' worth data, putting it in the default
/usr/local/pgsql/data location.

Then I start outgrowing that disk, and need to add another state, so I
add another disk, create a new tablespace, and create a new table
called FOO1 in this new tablespace. Then I can store the new states in
FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on,
I can store each FOOn in a new tablespace. And, as long as I ensure
that each FOOn table contains a geographically consistent spatial
extent, I can build logic in my application to query the correct
table.

So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to
-20 lon in FOO1, then if the user requests data for -5 lon to -15 lon,
I will have to query both FOO0 and FOO1.

More work for me, but it is doable, no? Any insights on how to handle
something like this?

A corollary question -- are their any speed advantages to actually
creating multiple PostGIS instances, perhaps even splitting them
across multiple machines? Of course, it is going to be a pain in the
ass for me to maintain more than one instance of PostGres/PostGIS, so
I am not thrilled at that possibility. I'd rather have a single
instance just be managing data across multiple locations as required.


-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
---
Assertions are politics; backing up assertions with evidence is science
===
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users