Re: [postgis-users] splitting a PostGIS db across multiple disks
Option #1: Hardware RAID Pros: Fast, application agnostic performance and failover. Can resize on some controllers and use a file system capable of dynamic resizing (ZFS, XFS, NTFS) Cons: Expensive controller hardware, possibly complicated file system resizing on cheaper controllers. Filesystem may be offline during resize (unless ZFS). Option #2: Software RAID Pros: Linux and Windows has built in resizing for these volumes. Protected data when using RAID 10, 5, 1. Cons: Can steal processor time when using RAID 5 & 6. Option #3: Dedicated iSCSI, AoE, FC SAN Pros: Not stealing processor from application server. Can use software RAID on SAN controller without application server performance penalty. Can be cheap if using existing server and LAN network/fabric. Cons: You're managing another server, possibly storage fabric and switches. Can get expensive. Option #4: Cheap onboard SATA JBOD RAID with resizable filesystem. Pros: Not much if any processor overhead. Can hot-add disks on most SATA boards. Cons: One disk goes and the whole dataset is shot. Option #5: Table inheritance or View with INSTEAD OF triggers on insert, delete and update. Pros: Can create your partitioned table on any disk. Cons: Complicated setup and many potential pitfalls. May not be able to take full advantage of PostgreSQL indexing in some cases. Option #6: Compressed file system Pros: Cheap space savings. Can INCREASE read times where processor is faster than the disk subsystem. Cons: May steal processor when disk subsystem is sufficiently fast (ie RAID). My suggestion: Use a filesystem that allows online resizing (EXT3+, XFS, etc..) on a Hardware/Software RAID that allows hot drive addition. This will get you your 99.9% uptime. >-Original Message- >From: postgis-users-boun...@postgis.refractions.net >[mailto:postgis-users-boun...@postgis.refractions.net] On >Behalf Of P Kishor >Sent: Tuesday, April 06, 2010 7:34 AM >To: PostGIS Users Discussion >Subject: [postgis-users] splitting a PostGIS db across multiple disks > > >I asked this question yesterday, and received a very helpful >pointer from Ben Madin re. TABLESPACES. As noted in my reply >in that thread, I am also investigating the possibility of >splitting a single table across multiple disks. > >However, I am going to post this question in a different way >in this new thread. > >Suppose I have a table FOO0 that stores info about every state >in the union. I know that some of these states will have mongo >number of rows, but I don't have to build all the states >immediately. So, I start with a few states' worth data, >putting it in the default /usr/local/pgsql/data location. > >Then I start outgrowing that disk, and need to add another >state, so I add another disk, create a new tablespace, and >create a new table called FOO1 in this new tablespace. Then I >can store the new states in FOO1. As long as I break up my >table into FOO0, FOO1, FOO2, and so on, I can store each FOOn >in a new tablespace. And, as long as I ensure that each FOOn >table contains a geographically consistent spatial extent, I >can build logic in my application to query the correct table. > >So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 >lon to -20 lon in FOO1, then if the user requests data for -5 >lon to -15 lon, I will have to query both FOO0 and FOO1. > >More work for me, but it is doable, no? Any insights on how to >handle something like this? > >A corollary question -- are their any speed advantages to >actually creating multiple PostGIS instances, perhaps even >splitting them across multiple machines? Of course, it is >going to be a pain in the ass for me to maintain more than one >instance of PostGres/PostGIS, so I am not thrilled at that >possibility. I'd rather have a single instance just be >managing data across multiple locations as required. > > >-- >Puneet Kishor http://www.punkish.org >Carbon Model http://carbonmodel.org >Charter Member, Open Source Geospatial Foundation >http://www.osgeo.org Science Commons Fellow, >http://sciencecommons.org/about/whoweare/kishor >Nelson Institute, UW-Madison http://www.nelson.wisc.edu >--- >Assertions are politics; backing up assertions with evidence >is science >=== >___ >postgis-users mailing list postgis-users@postgis.refractions.net >http://postgis.refractions.net/mailman/listinfo/postgis-users > ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] splitting a PostGIS db across multiple disks
On 06/04/2010, P Kishor wrote: > On Tue, Apr 6, 2010 at 10:22 AM, Chris Hermansen > wrote: >> What about using RAID? No data reorganization necessary. >> > > I am already using a RAID, and conceptually, a RAID is the same as a > single disk. My issue arises when I run out of space on RAID as well. > While I could extend the RAID, and theoretically never have any space > limitation, I would still have to use TABLESPACES, just so I can store > part of the db in another location on the expanded RAID space, and > also have to utilize table partitioning, because, at the db level, I > am dealing with a single table that is growing beyond what is > available on the RAID. > > I am assuming that adding more disks to a RAID is not a simple task, > because the RAID set has to be migrated to incorporate the newly added > disk... I tried a RAID migration yesterday, and it takes a long time. > So, tablespaces are essential. Table partitioning would allow me to > deal with a single conceptual table, so my application would not be > riddled with junky logic (if querying this geog. area, query that > table, if querying that geog. area, query that other table and such > nonsense). > > Wrt the dataset itself... it is all point data. My current estimate is > that the raw dataset is going to be about 125 billion rows in a table. > In terms of disk space, it will be about 4 TB raw, that is, before it > goes into the db. Indexes and other db overhead might 1.5x or 2x it, > so, say, max 8 TB. This may cause a lot of discussion... but in my opinion (though I posted a few minutes ago answering a comment about RAID) ... my opinion is that if you can design your own software, then design out RAID. RAID is required because software needs lots of space, and disks are not big enough. Its a compromise. Then add more disks, you increase the probability of failure, so you have to build in redundancy... you need expensive hardware RAID controllers, you need 3U or bigger servers... cost goes up... If you can split your dataset up logically, across multiple disks, and (even better) multiple hosts... then it will be faster, you isolate the chance of failure to just one section of your service (rather than the whole service), and you can use less expensive servers, or get a blade cabinet and really squeeze them in to use less server room space and therefore less rental costs. But... this assumes that you are at the point where you can design your own software. I've been in this position (being able to design my own PostGis - based service) and this is how I did it, and how I'd do it again (except if anything, more so). It makes sense to me to use small servers, (relatively) small disks, and smaller service applications. > >> http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide >> >> As the above notes, disk performance (eg spindle speed) is important. >> >> Also, to get good performance, you need to understand your application's >> performance needs. Do you have a lot of small queries, etc etc. If you >> don't understand your application's performance needs, then you may be >> better off running your application for awhile and measuring its >> bottlenecks before you decide on a storage strategy. >> >> P Kishor wrote: >>> I asked this question yesterday, and received a very helpful pointer >>> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I >>> am also investigating the possibility of splitting a single table >>> across multiple disks. >>> >>> However, I am going to post this question in a different way in this new >>> thread. >>> >>> Suppose I have a table FOO0 that stores info about every state in the >>> union. I know that some of these states will have mongo number of >>> rows, but I don't have to build all the states immediately. So, I >>> start with a few states' worth data, putting it in the default >>> /usr/local/pgsql/data location. >>> >>> Then I start outgrowing that disk, and need to add another state, so I >>> add another disk, create a new tablespace, and create a new table >>> called FOO1 in this new tablespace. Then I can store the new states in >>> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, >>> I can store each FOOn in a new tablespace. And, as long as I ensure >>> that each FOOn table contains a geographically consistent spatial >>> extent, I can build logic in my application to query the correct >>> table. >>> >>> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to >>> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, >>> I will have to query both FOO0 and FOO1. >>> >>> More work for me, but it is doable, no? Any insights on how to handle >>> something like this? >>> >>> A corollary question -- are their any speed advantages to actually >>> creating multiple PostGIS instances, perhaps even splitting them >>> across multiple machines? Of course, it is going to be a pain in the >>> ass for me to maintain more than one inst
Re: [postgis-users] splitting a PostGIS db across multiple disks
A big database, that's for sure. You said in an earlier mail that you had a 1Tb drive. one 1Tb drive is not RAID. If you think you're going to need 8Tb, then with the current price of storage - about $100 per quality Tb - why not just get your storage, configure as RAID 10 or something like that, and be done with it? No adding disks, no time spent reconfiguring every time you add one, etc etc. Plus you get redundancy. If you partition your tables, you need to have a key value of some kind to organize the partitioning. A bad choice of keys may lead to a few of your disks getting more access than others which will give you less optimal performance, or it may lead to you having stuff pile up on one disk and eventually exhaust the space there. Either of which could require you to re-organize your keys and data, which would be expensive and time consuming. P Kishor wrote: > On Tue, Apr 6, 2010 at 10:22 AM, Chris Hermansen > wrote: > >> What about using RAID? No data reorganization necessary. >> >> > > I am already using a RAID, and conceptually, a RAID is the same as a > single disk. My issue arises when I run out of space on RAID as well. > While I could extend the RAID, and theoretically never have any space > limitation, I would still have to use TABLESPACES, just so I can store > part of the db in another location on the expanded RAID space, and > also have to utilize table partitioning, because, at the db level, I > am dealing with a single table that is growing beyond what is > available on the RAID. > > I am assuming that adding more disks to a RAID is not a simple task, > because the RAID set has to be migrated to incorporate the newly added > disk... I tried a RAID migration yesterday, and it takes a long time. > So, tablespaces are essential. Table partitioning would allow me to > deal with a single conceptual table, so my application would not be > riddled with junky logic (if querying this geog. area, query that > table, if querying that geog. area, query that other table and such > nonsense). > > Wrt the dataset itself... it is all point data. My current estimate is > that the raw dataset is going to be about 125 billion rows in a table. > In terms of disk space, it will be about 4 TB raw, that is, before it > goes into the db. Indexes and other db overhead might 1.5x or 2x it, > so, say, max 8 TB. > > > >> http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide >> >> As the above notes, disk performance (eg spindle speed) is important. >> >> Also, to get good performance, you need to understand your application's >> performance needs. Do you have a lot of small queries, etc etc. If you >> don't understand your application's performance needs, then you may be >> better off running your application for awhile and measuring its >> bottlenecks before you decide on a storage strategy. >> >> P Kishor wrote: >> >>> I asked this question yesterday, and received a very helpful pointer >>> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I >>> am also investigating the possibility of splitting a single table >>> across multiple disks. >>> >>> However, I am going to post this question in a different way in this new >>> thread. >>> >>> Suppose I have a table FOO0 that stores info about every state in the >>> union. I know that some of these states will have mongo number of >>> rows, but I don't have to build all the states immediately. So, I >>> start with a few states' worth data, putting it in the default >>> /usr/local/pgsql/data location. >>> >>> Then I start outgrowing that disk, and need to add another state, so I >>> add another disk, create a new tablespace, and create a new table >>> called FOO1 in this new tablespace. Then I can store the new states in >>> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, >>> I can store each FOOn in a new tablespace. And, as long as I ensure >>> that each FOOn table contains a geographically consistent spatial >>> extent, I can build logic in my application to query the correct >>> table. >>> >>> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to >>> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, >>> I will have to query both FOO0 and FOO1. >>> >>> More work for me, but it is doable, no? Any insights on how to handle >>> something like this? >>> >>> A corollary question -- are their any speed advantages to actually >>> creating multiple PostGIS instances, perhaps even splitting them >>> across multiple machines? Of course, it is going to be a pain in the >>> ass for me to maintain more than one instance of PostGres/PostGIS, so >>> I am not thrilled at that possibility. I'd rather have a single >>> instance just be managing data across multiple locations as required. >>> >>> >>> >>> >> -- >> Regards, >> >> Chris Hermansen·mailto:chris.herman...@timberline.ca >> tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625 >> Timberli
Re: [postgis-users] splitting a PostGIS db across multiple disks
On Tue, Apr 6, 2010 at 1:12 PM, P Kishor wrote: > On Tue, Apr 6, 2010 at 10:22 AM, Chris Hermansen > wrote: >> What about using RAID? No data reorganization necessary. >> > > I am already using a RAID, and conceptually, a RAID is the same as a > single disk. My issue arises when I run out of space on RAID as well. I guess I could render all this discussion moot by going with a Drobo. The Drobo with its "BeyondRAID" technology allows me to add drives as time goes by, as my space need grows, as I acquire more funds to purchase more hardware. To the installed PostGres/PostGIS instance, it just appears as a logical drive that magically gets bigger when needed. No RAID set migration, no table partitioning, no tablespaces (unless absolutely required for performance purpose). Anyone have any experience with speed when running a database on a Drobo? They are selling a two-pack of Drobo with 32 TB total for under $6000. http://www.drobostore.com:80/servlet/ControllerServlet?Action=DisplayPage&Env=BASE&Locale=en_US&SiteID=drobo&id=ShoppingCartPage > While I could extend the RAID, and theoretically never have any space > limitation, I would still have to use TABLESPACES, just so I can store > part of the db in another location on the expanded RAID space, and > also have to utilize table partitioning, because, at the db level, I > am dealing with a single table that is growing beyond what is > available on the RAID. > > I am assuming that adding more disks to a RAID is not a simple task, > because the RAID set has to be migrated to incorporate the newly added > disk... I tried a RAID migration yesterday, and it takes a long time. > So, tablespaces are essential. Table partitioning would allow me to > deal with a single conceptual table, so my application would not be > riddled with junky logic (if querying this geog. area, query that > table, if querying that geog. area, query that other table and such > nonsense). > > Wrt the dataset itself... it is all point data. My current estimate is > that the raw dataset is going to be about 125 billion rows in a table. > In terms of disk space, it will be about 4 TB raw, that is, before it > goes into the db. Indexes and other db overhead might 1.5x or 2x it, > so, say, max 8 TB. > > >> http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide >> >> As the above notes, disk performance (eg spindle speed) is important. >> >> Also, to get good performance, you need to understand your application's >> performance needs. Do you have a lot of small queries, etc etc. If you >> don't understand your application's performance needs, then you may be >> better off running your application for awhile and measuring its >> bottlenecks before you decide on a storage strategy. >> >> P Kishor wrote: >>> I asked this question yesterday, and received a very helpful pointer >>> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I >>> am also investigating the possibility of splitting a single table >>> across multiple disks. >>> >>> However, I am going to post this question in a different way in this new >>> thread. >>> >>> Suppose I have a table FOO0 that stores info about every state in the >>> union. I know that some of these states will have mongo number of >>> rows, but I don't have to build all the states immediately. So, I >>> start with a few states' worth data, putting it in the default >>> /usr/local/pgsql/data location. >>> >>> Then I start outgrowing that disk, and need to add another state, so I >>> add another disk, create a new tablespace, and create a new table >>> called FOO1 in this new tablespace. Then I can store the new states in >>> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, >>> I can store each FOOn in a new tablespace. And, as long as I ensure >>> that each FOOn table contains a geographically consistent spatial >>> extent, I can build logic in my application to query the correct >>> table. >>> >>> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to >>> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, >>> I will have to query both FOO0 and FOO1. >>> >>> More work for me, but it is doable, no? Any insights on how to handle >>> something like this? >>> >>> A corollary question -- are their any speed advantages to actually >>> creating multiple PostGIS instances, perhaps even splitting them >>> across multiple machines? Of course, it is going to be a pain in the >>> ass for me to maintain more than one instance of PostGres/PostGIS, so >>> I am not thrilled at that possibility. I'd rather have a single >>> instance just be managing data across multiple locations as required. >>> >>> >>> >> >> -- >> Regards, >> >> Chris Hermansen · mailto:chris.herman...@timberline.ca >> tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625 >> Timberline Natural Resource Group · http://www.timberline.ca >> 401 · 958 West 8th Avenue · Vancouver BC · Canada · V5Z 1E5 >> >> __
Re: [postgis-users] splitting a PostGIS db across multiple disks
On Tue, Apr 6, 2010 at 10:22 AM, Chris Hermansen wrote: > What about using RAID? No data reorganization necessary. > I am already using a RAID, and conceptually, a RAID is the same as a single disk. My issue arises when I run out of space on RAID as well. While I could extend the RAID, and theoretically never have any space limitation, I would still have to use TABLESPACES, just so I can store part of the db in another location on the expanded RAID space, and also have to utilize table partitioning, because, at the db level, I am dealing with a single table that is growing beyond what is available on the RAID. I am assuming that adding more disks to a RAID is not a simple task, because the RAID set has to be migrated to incorporate the newly added disk... I tried a RAID migration yesterday, and it takes a long time. So, tablespaces are essential. Table partitioning would allow me to deal with a single conceptual table, so my application would not be riddled with junky logic (if querying this geog. area, query that table, if querying that geog. area, query that other table and such nonsense). Wrt the dataset itself... it is all point data. My current estimate is that the raw dataset is going to be about 125 billion rows in a table. In terms of disk space, it will be about 4 TB raw, that is, before it goes into the db. Indexes and other db overhead might 1.5x or 2x it, so, say, max 8 TB. > http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide > > As the above notes, disk performance (eg spindle speed) is important. > > Also, to get good performance, you need to understand your application's > performance needs. Do you have a lot of small queries, etc etc. If you > don't understand your application's performance needs, then you may be > better off running your application for awhile and measuring its > bottlenecks before you decide on a storage strategy. > > P Kishor wrote: >> I asked this question yesterday, and received a very helpful pointer >> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I >> am also investigating the possibility of splitting a single table >> across multiple disks. >> >> However, I am going to post this question in a different way in this new >> thread. >> >> Suppose I have a table FOO0 that stores info about every state in the >> union. I know that some of these states will have mongo number of >> rows, but I don't have to build all the states immediately. So, I >> start with a few states' worth data, putting it in the default >> /usr/local/pgsql/data location. >> >> Then I start outgrowing that disk, and need to add another state, so I >> add another disk, create a new tablespace, and create a new table >> called FOO1 in this new tablespace. Then I can store the new states in >> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, >> I can store each FOOn in a new tablespace. And, as long as I ensure >> that each FOOn table contains a geographically consistent spatial >> extent, I can build logic in my application to query the correct >> table. >> >> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to >> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, >> I will have to query both FOO0 and FOO1. >> >> More work for me, but it is doable, no? Any insights on how to handle >> something like this? >> >> A corollary question -- are their any speed advantages to actually >> creating multiple PostGIS instances, perhaps even splitting them >> across multiple machines? Of course, it is going to be a pain in the >> ass for me to maintain more than one instance of PostGres/PostGIS, so >> I am not thrilled at that possibility. I'd rather have a single >> instance just be managing data across multiple locations as required. >> >> >> > > -- > Regards, > > Chris Hermansen · mailto:chris.herman...@timberline.ca > tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625 > Timberline Natural Resource Group · http://www.timberline.ca > 401 · 958 West 8th Avenue · Vancouver BC · Canada · V5Z 1E5 > > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users > -- Puneet Kishor http://www.punkish.org Carbon Model http://carbonmodel.org Charter Member, Open Source Geospatial Foundation http://www.osgeo.org Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor Nelson Institute, UW-Madison http://www.nelson.wisc.edu --- Assertions are politics; backing up assertions with evidence is science === ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] splitting a PostGIS db across multiple disks
On 06/04/2010, Chris Hermansen wrote: > What about using RAID? No data reorganization necessary. > > http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide > > As the above notes, disk performance (eg spindle speed) is important. > > Also, to get good performance, you need to understand your application's > performance needs. Do you have a lot of small queries, etc etc. If you > don't understand your application's performance needs, then you may be > better off running your application for awhile and measuring its > bottlenecks before you decide on a storage strategy. very good points. Regarding RAID, as a general rule of thumb, RAID 1+0 is best for a database, also called RAID 10. To configure a good RAID system, make sure you have a multi-processor mother-board, lots of memory, and an Adaptec RAID controller (hardware RAID, not software RAID.) > > P Kishor wrote: >> I asked this question yesterday, and received a very helpful pointer >> from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I >> am also investigating the possibility of splitting a single table >> across multiple disks. >> >> However, I am going to post this question in a different way in this new >> thread. >> >> Suppose I have a table FOO0 that stores info about every state in the >> union. I know that some of these states will have mongo number of >> rows, but I don't have to build all the states immediately. So, I >> start with a few states' worth data, putting it in the default >> /usr/local/pgsql/data location. >> >> Then I start outgrowing that disk, and need to add another state, so I >> add another disk, create a new tablespace, and create a new table >> called FOO1 in this new tablespace. Then I can store the new states in >> FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, >> I can store each FOOn in a new tablespace. And, as long as I ensure >> that each FOOn table contains a geographically consistent spatial >> extent, I can build logic in my application to query the correct >> table. >> >> So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to >> -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, >> I will have to query both FOO0 and FOO1. >> >> More work for me, but it is doable, no? Any insights on how to handle >> something like this? >> >> A corollary question -- are their any speed advantages to actually >> creating multiple PostGIS instances, perhaps even splitting them >> across multiple machines? Of course, it is going to be a pain in the >> ass for me to maintain more than one instance of PostGres/PostGIS, so >> I am not thrilled at that possibility. I'd rather have a single >> instance just be managing data across multiple locations as required. >> >> >> > > -- > Regards, > > Chris Hermansen·mailto:chris.herman...@timberline.ca > tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625 > Timberline Natural Resource Group · http://www.timberline.ca > 401 · 958 West 8th Avenue · Vancouver BC · Canada · V5Z 1E5 > > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users > -- Brian Modra Land line: +27 23 5411 462 Mobile: +27 79 69 77 082 5 Jan Louw Str, Prince Albert, 6930 Postal: P.O. Box 2, Prince Albert 6930 South Africa http://www.zwartberg.com/ ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] splitting a PostGIS db across multiple disks
Puneet I'd wonder if a view can do it off multiple tables ? Having said that, you might want to check the archives for inheritance. I recall someone ?Leo? mentioning something similar when I was talking about keeping an archive table without the current data, but being able to look up data from both as a way of dealing with Change of Support issues. I think you can have an inherited table with no data inheriting from a number of tables. You can stop the query searching all tables by having constraints on them - maybe they could reflect bounding boxes? cheers Ben On 06/04/2010, at 22:33 , P Kishor wrote: > I asked this question yesterday, and received a very helpful pointer > from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I > am also investigating the possibility of splitting a single table > across multiple disks. > > However, I am going to post this question in a different way in this new > thread. > > Suppose I have a table FOO0 that stores info about every state in the > union. I know that some of these states will have mongo number of > rows, but I don't have to build all the states immediately. So, I > start with a few states' worth data, putting it in the default > /usr/local/pgsql/data location. > > Then I start outgrowing that disk, and need to add another state, so I > add another disk, create a new tablespace, and create a new table > called FOO1 in this new tablespace. Then I can store the new states in > FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, > I can store each FOOn in a new tablespace. And, as long as I ensure > that each FOOn table contains a geographically consistent spatial > extent, I can build logic in my application to query the correct > table. > > So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to > -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, > I will have to query both FOO0 and FOO1. > > More work for me, but it is doable, no? Any insights on how to handle > something like this? > > A corollary question -- are their any speed advantages to actually > creating multiple PostGIS instances, perhaps even splitting them > across multiple machines? Of course, it is going to be a pain in the > ass for me to maintain more than one instance of PostGres/PostGIS, so > I am not thrilled at that possibility. I'd rather have a single > instance just be managing data across multiple locations as required. > > > -- > Puneet Kishor http://www.punkish.org > Carbon Model http://carbonmodel.org > Charter Member, Open Source Geospatial Foundation http://www.osgeo.org > Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor > Nelson Institute, UW-Madison http://www.nelson.wisc.edu > --- > Assertions are politics; backing up assertions with evidence is science > === > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] splitting a PostGIS db across multiple disks
What about using RAID? No data reorganization necessary. http://wiki.postgresql.org/wiki/Community_Disk_Tuning_Guide As the above notes, disk performance (eg spindle speed) is important. Also, to get good performance, you need to understand your application's performance needs. Do you have a lot of small queries, etc etc. If you don't understand your application's performance needs, then you may be better off running your application for awhile and measuring its bottlenecks before you decide on a storage strategy. P Kishor wrote: > I asked this question yesterday, and received a very helpful pointer > from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I > am also investigating the possibility of splitting a single table > across multiple disks. > > However, I am going to post this question in a different way in this new > thread. > > Suppose I have a table FOO0 that stores info about every state in the > union. I know that some of these states will have mongo number of > rows, but I don't have to build all the states immediately. So, I > start with a few states' worth data, putting it in the default > /usr/local/pgsql/data location. > > Then I start outgrowing that disk, and need to add another state, so I > add another disk, create a new tablespace, and create a new table > called FOO1 in this new tablespace. Then I can store the new states in > FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, > I can store each FOOn in a new tablespace. And, as long as I ensure > that each FOOn table contains a geographically consistent spatial > extent, I can build logic in my application to query the correct > table. > > So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to > -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, > I will have to query both FOO0 and FOO1. > > More work for me, but it is doable, no? Any insights on how to handle > something like this? > > A corollary question -- are their any speed advantages to actually > creating multiple PostGIS instances, perhaps even splitting them > across multiple machines? Of course, it is going to be a pain in the > ass for me to maintain more than one instance of PostGres/PostGIS, so > I am not thrilled at that possibility. I'd rather have a single > instance just be managing data across multiple locations as required. > > > -- Regards, Chris Hermansen·mailto:chris.herman...@timberline.ca tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625 Timberline Natural Resource Group · http://www.timberline.ca 401 · 958 West 8th Avenue · Vancouver BC · Canada · V5Z 1E5 ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] splitting a PostGIS db across multiple disks
You can make multiple disks act like one using a method called striping. Here is the first link I googled. Apparently it will increase performance as well... http://insights.oetiker.ch/linux/raidoptimization/ On Tue, Apr 6, 2010 at 10:58 AM, Brian Modra wrote: > On 06/04/2010, P Kishor wrote: > > I asked this question yesterday, and received a very helpful pointer > > from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I > > am also investigating the possibility of splitting a single table > > across multiple disks. > > > > However, I am going to post this question in a different way in this new > > thread. > > > > Suppose I have a table FOO0 that stores info about every state in the > > union. I know that some of these states will have mongo number of > > rows, but I don't have to build all the states immediately. So, I > > start with a few states' worth data, putting it in the default > > /usr/local/pgsql/data location. > > > > Then I start outgrowing that disk, and need to add another state, so I > > add another disk, create a new tablespace, and create a new table > > called FOO1 in this new tablespace. Then I can store the new states in > > FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, > > I can store each FOOn in a new tablespace. And, as long as I ensure > > that each FOOn table contains a geographically consistent spatial > > extent, I can build logic in my application to query the correct > > table. > > > > So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to > > -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, > > I will have to query both FOO0 and FOO1. > > If you split the data by lat-long extents, then you will need to also > create duplicates of some of the data... anything that intersects the > boundary must be duplicated... > > I assume you have a table with state boundaries in it? > You could use this as an "index" to the table name. Then store a state > per table... > > So when you do a spatial query, you first use the extent of the query > to find which state(s) are withing/intersecting/contain the query > area. Then you can generate one or more spatial queries to the tables > containing the states data. > > > More work for me, but it is doable, no? Any insights on how to handle > > something like this? > > yes > > > A corollary question -- are their any speed advantages to actually > > creating multiple PostGIS instances, perhaps even splitting them > > across multiple machines? Of course, it is going to be a pain in the > > ass for me to maintain more than one instance of PostGres/PostGIS, so > > I am not thrilled at that possibility. I'd rather have a single > > instance just be managing data across multiple locations as required. > > Using multiple machines means that parallel retrievals are possible. > It has to make it faster. > ... but your idea of splitting data geographically probably makes this > unlikely to happen in the majority of queries. > > Rather than split the data geographically, is there some property in > the tables that can be used to split it up... so that certain rows > (based on this key) go to different tables? > This may result in a better performance benefit, because a single > spatial query will hit multiple databases, returning rows in parallel. > > But then again, all this depends on how you gather your data for a > query... to make what I just described work, you'd have to execute > multiple queries at once, i.e. multi-threading. > > > > > > > -- > > Puneet Kishor http://www.punkish.org > > Carbon Model http://carbonmodel.org > > Charter Member, Open Source Geospatial Foundation http://www.osgeo.org > > Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor > > Nelson Institute, UW-Madison http://www.nelson.wisc.edu > > --- > > Assertions are politics; backing up assertions with evidence is science > > === > > ___ > > postgis-users mailing list > > postgis-users@postgis.refractions.net > > http://postgis.refractions.net/mailman/listinfo/postgis-users > > > > > -- > Brian Modra Land line: +27 23 5411 462 > Mobile: +27 79 69 77 082 > 5 Jan Louw Str, Prince Albert, 6930 > Postal: P.O. Box 2, Prince Albert 6930 > South Africa > http://www.zwartberg.com/ > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users > -- Cheers! Rick ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] splitting a PostGIS db across multiple disks
On 06/04/2010, P Kishor wrote: > I asked this question yesterday, and received a very helpful pointer > from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I > am also investigating the possibility of splitting a single table > across multiple disks. > > However, I am going to post this question in a different way in this new > thread. > > Suppose I have a table FOO0 that stores info about every state in the > union. I know that some of these states will have mongo number of > rows, but I don't have to build all the states immediately. So, I > start with a few states' worth data, putting it in the default > /usr/local/pgsql/data location. > > Then I start outgrowing that disk, and need to add another state, so I > add another disk, create a new tablespace, and create a new table > called FOO1 in this new tablespace. Then I can store the new states in > FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, > I can store each FOOn in a new tablespace. And, as long as I ensure > that each FOOn table contains a geographically consistent spatial > extent, I can build logic in my application to query the correct > table. > > So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to > -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, > I will have to query both FOO0 and FOO1. If you split the data by lat-long extents, then you will need to also create duplicates of some of the data... anything that intersects the boundary must be duplicated... I assume you have a table with state boundaries in it? You could use this as an "index" to the table name. Then store a state per table... So when you do a spatial query, you first use the extent of the query to find which state(s) are withing/intersecting/contain the query area. Then you can generate one or more spatial queries to the tables containing the states data. > More work for me, but it is doable, no? Any insights on how to handle > something like this? yes > A corollary question -- are their any speed advantages to actually > creating multiple PostGIS instances, perhaps even splitting them > across multiple machines? Of course, it is going to be a pain in the > ass for me to maintain more than one instance of PostGres/PostGIS, so > I am not thrilled at that possibility. I'd rather have a single > instance just be managing data across multiple locations as required. Using multiple machines means that parallel retrievals are possible. It has to make it faster. ... but your idea of splitting data geographically probably makes this unlikely to happen in the majority of queries. Rather than split the data geographically, is there some property in the tables that can be used to split it up... so that certain rows (based on this key) go to different tables? This may result in a better performance benefit, because a single spatial query will hit multiple databases, returning rows in parallel. But then again, all this depends on how you gather your data for a query... to make what I just described work, you'd have to execute multiple queries at once, i.e. multi-threading. > > > -- > Puneet Kishor http://www.punkish.org > Carbon Model http://carbonmodel.org > Charter Member, Open Source Geospatial Foundation http://www.osgeo.org > Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor > Nelson Institute, UW-Madison http://www.nelson.wisc.edu > --- > Assertions are politics; backing up assertions with evidence is science > === > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users > -- Brian Modra Land line: +27 23 5411 462 Mobile: +27 79 69 77 082 5 Jan Louw Str, Prince Albert, 6930 Postal: P.O. Box 2, Prince Albert 6930 South Africa http://www.zwartberg.com/ ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] splitting a PostGIS db across multiple disks
On Tue, Apr 6, 2010 at 10:33 AM, P Kishor wrote: > I asked this question yesterday, and received a very helpful pointer > from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I > am also investigating the possibility of splitting a single table > across multiple disks. > > However, I am going to post this question in a different way in this new > thread. > > Suppose I have a table FOO0 that stores info about every state in the > union. I know that some of these states will have mongo number of > rows, but I don't have to build all the states immediately. So, I > start with a few states' worth data, putting it in the default > /usr/local/pgsql/data location. > > Then I start outgrowing that disk, and need to add another state, so I > add another disk, create a new tablespace, and create a new table > called FOO1 in this new tablespace. Then I can store the new states in > FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, > I can store each FOOn in a new tablespace. And, as long as I ensure > that each FOOn table contains a geographically consistent spatial > extent, I can build logic in my application to query the correct > table. > > So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to > -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, > I will have to query both FOO0 and FOO1. > > More work for me, but it is doable, no? Any insights on how to handle > something like this? Use table partitioning http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html > > A corollary question -- are their any speed advantages to actually > creating multiple PostGIS instances, perhaps even splitting them > across multiple machines? Of course, it is going to be a pain in the > ass for me to maintain more than one instance of PostGres/PostGIS, so > I am not thrilled at that possibility. I'd rather have a single > instance just be managing data across multiple locations as required. > > > -- > Puneet Kishor http://www.punkish.org > Carbon Model http://carbonmodel.org > Charter Member, Open Source Geospatial Foundation http://www.osgeo.org > Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor > Nelson Institute, UW-Madison http://www.nelson.wisc.edu > --- > Assertions are politics; backing up assertions with evidence is science > === > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users > -- -- Jim Mlodgenski EnterpriseDB (http://www.enterprisedb.com) ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] splitting a PostGIS db across multiple disks
On Tue, 2010-04-06 at 09:33 -0500, P Kishor wrote: > Suppose I have a table FOO0 that stores info about every state in the > union. I know that some of these states will have mongo number of > rows, but I don't have to build all the states immediately. So, I > start with a few states' worth data, putting it in the default > /usr/local/pgsql/data location. you should use some form of hashing function to distribute the stuff evenly ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
[postgis-users] splitting a PostGIS db across multiple disks
I asked this question yesterday, and received a very helpful pointer from Ben Madin re. TABLESPACES. As noted in my reply in that thread, I am also investigating the possibility of splitting a single table across multiple disks. However, I am going to post this question in a different way in this new thread. Suppose I have a table FOO0 that stores info about every state in the union. I know that some of these states will have mongo number of rows, but I don't have to build all the states immediately. So, I start with a few states' worth data, putting it in the default /usr/local/pgsql/data location. Then I start outgrowing that disk, and need to add another state, so I add another disk, create a new tablespace, and create a new table called FOO1 in this new tablespace. Then I can store the new states in FOO1. As long as I break up my table into FOO0, FOO1, FOO2, and so on, I can store each FOOn in a new tablespace. And, as long as I ensure that each FOOn table contains a geographically consistent spatial extent, I can build logic in my application to query the correct table. So, lets say 0 lon to -10 lon data are stored in FOO0, and -10 lon to -20 lon in FOO1, then if the user requests data for -5 lon to -15 lon, I will have to query both FOO0 and FOO1. More work for me, but it is doable, no? Any insights on how to handle something like this? A corollary question -- are their any speed advantages to actually creating multiple PostGIS instances, perhaps even splitting them across multiple machines? Of course, it is going to be a pain in the ass for me to maintain more than one instance of PostGres/PostGIS, so I am not thrilled at that possibility. I'd rather have a single instance just be managing data across multiple locations as required. -- Puneet Kishor http://www.punkish.org Carbon Model http://carbonmodel.org Charter Member, Open Source Geospatial Foundation http://www.osgeo.org Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor Nelson Institute, UW-Madison http://www.nelson.wisc.edu --- Assertions are politics; backing up assertions with evidence is science === ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users