Re: [GENERAL] Two billion records ok?
Nick Bower wrote: We're considering using Postgresql for storing gridded metadata - each point of our grids has a variety of metadata attached to it (including lat/lon, measurements, etc) and would constitute a record in Postgresql+Postgis. Size-wise, grids are about 4000x700 and are collected twice daily over say 10 years. As mentioned, each record would have up to 50 metadata attributes (columns) including geom, floats, varchars etc. So given 4000x700x2x365x10 2 billion, is this going to be a problem if we will be wanting to query on datetimes, Postgis lat/lon, and integer-based metadata flags? H... 2 billion looks optimistic... I get 2.044e+10, which is 20 billion. I'm currently working with a table of over 200,000,000 records. With a clustered index underlying partitioning, response times are more than acceptable (a 25 wide self relation (left outer join) for 3 months data (records are around 40/minute) is about 2 minutes. Simple query with a where clause on timestamp is a few secs at most. This is on a 32 bit Intel system with only 2Gb memory mirrored 7200RPM SATA hard drives. I'd suggest partition on timestamp, maybe per year at least, use a clustered index on timestamp. It should be viable if your hardware is up to it. I'd also strongly recommend a suitable platform, 64 bit Linux on AMD 64 or Opteron with as much memory the fastest RAID setup you can afford. Make sure you use a fully 64 bit version of Postgres/Postgis on this platform as well. If the same grid is being resampled, then a separate table defining the grid, and a join on grid ID to the main (partitioned) table may improve performance ( reduce data volume). I wouldn't expect instantaneous answers from it, but don't know of any reason it won't work. Depends very much on what level of performance is considered acceptable. I'm also using Postgis grids with up to a few million cells and spatial joins to a millions or so tracklines to generate gridded models for analysis in R. You seem to be scaling this up from where I'm at, so I'd like to know how you get on.. Cheers, Brent Wood ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Two billion records ok?
Nick Bower wrote: Thanks - but what do you call big? How many stars do you think there are? :-) My application is satellite data btw so the reference could be useful. On Tue, 5 Sep 2006 01:40 pm, Oleg Bartunov wrote: Nick, if you need very fast spatial queries (spherical) you may use our Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access to very big astronomical catalogs. Oleg Tim -- --- Tim Allen [EMAIL PROTECTED] Proximity Pty Ltd http://www.proximity.com.au/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Two billion records ok?
On Tue, 5 Sep 2006, Nick Bower wrote: Thanks - but what do you call big? Several billions of stars. You can try our Cone Search service at http://vo.astronet.ru/cas/conesearch.php Oleg My application is satellite data btw so the reference could be useful. On Tue, 5 Sep 2006 01:40 pm, Oleg Bartunov wrote: Nick, if you need very fast spatial queries (spherical) you may use our Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access to very big astronomical catalogs. Oleg On Tue, 5 Sep 2006, Nick Bower wrote: We're considering using Postgresql for storing gridded metadata - each point of our grids has a variety of metadata attached to it (including lat/lon, measurements, etc) and would constitute a record in Postgresql+Postgis. Size-wise, grids are about 4000x700 and are collected twice daily over say 10 years. As mentioned, each record would have up to 50 metadata attributes (columns) including geom, floats, varchars etc. So given 4000x700x2x365x10 2 billion, is this going to be a problem if we will be wanting to query on datetimes, Postgis lat/lon, and integer-based metadata flags? If however I'm forced to sub-sample the grid, what rule of thumb should I be looking to be constrained by? Thanks for any pointers, Nick PS - Feel free to throw in any other ideas of grid-suitable databases :) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Two billion records ok?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Nick Bower wrote: We're considering using Postgresql for storing gridded metadata - each point of our grids has a variety of metadata attached to it (including lat/lon, measurements, etc) and would constitute a record in Postgresql+Postgis. Size-wise, grids are about 4000x700 and are collected twice daily over say 10 years. As mentioned, each record would have up to 50 metadata attributes (columns) including geom, floats, varchars etc. So given 4000x700x2x365x10 2 billion, is this going to be a problem if we will be wanting to query on datetimes, Postgis lat/lon, and integer-based metadata flags? If however I'm forced to sub-sample the grid, what rule of thumb should I be looking to be constrained by? Thanks for any pointers, Nick Tablespaces and table partitioning will be crucial to your needs. I'm not sure if you can partition indexes, though. And too bad that compressed bit-map indexes have not been implemented yet. For indexes with high key cardinality, they save a *lot* of space, and queries can run a lot faster. - -- Ron Johnson, Jr. Jefferson LA USA Is common sense really valid? For example, it is common sense to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that common sense is obviously wrong. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE/Ur7S9HxQb37XmcRAsKLAKDnC36QSzRuaedSsXe+rQp3fbDbOgCfSwlQ ip2em5mEmXF45kek2rHKJvw= =uqTK -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Two billion records ok?
We're considering using Postgresql for storing gridded metadata - each point of our grids has a variety of metadata attached to it (including lat/lon, measurements, etc) and would constitute a record in Postgresql+Postgis. Size-wise, grids are about 4000x700 and are collected twice daily over say 10 years. As mentioned, each record would have up to 50 metadata attributes (columns) including geom, floats, varchars etc. So given 4000x700x2x365x10 2 billion, is this going to be a problem if we will be wanting to query on datetimes, Postgis lat/lon, and integer-based metadata flags? If however I'm forced to sub-sample the grid, what rule of thumb should I be looking to be constrained by? Thanks for any pointers, Nick PS - Feel free to throw in any other ideas of grid-suitable databases :) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Two billion records ok?
On Tue, Sep 05, 2006 at 09:26:59AM +0800, Nick Bower wrote: So given 4000x700x2x365x10 2 billion, is this going to be a problem if we will be wanting to query on datetimes, Postgis lat/lon, and integer-based metadata flags? That figure is about 20 billion, which is indeed 2 billion :-). If you plan to use integer row IDs then you'll need to use 64-bit bigint/bigserial instead of 32-bit integer/serial. I haven't worked with a database that large; maybe somebody else can give additional advice. -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Two billion records ok?
Nick, if you need very fast spatial queries (spherical) you may use our Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access to very big astronomical catalogs. Oleg On Tue, 5 Sep 2006, Nick Bower wrote: We're considering using Postgresql for storing gridded metadata - each point of our grids has a variety of metadata attached to it (including lat/lon, measurements, etc) and would constitute a record in Postgresql+Postgis. Size-wise, grids are about 4000x700 and are collected twice daily over say 10 years. As mentioned, each record would have up to 50 metadata attributes (columns) including geom, floats, varchars etc. So given 4000x700x2x365x10 2 billion, is this going to be a problem if we will be wanting to query on datetimes, Postgis lat/lon, and integer-based metadata flags? If however I'm forced to sub-sample the grid, what rule of thumb should I be looking to be constrained by? Thanks for any pointers, Nick PS - Feel free to throw in any other ideas of grid-suitable databases :) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Two billion records ok?
Thanks - but what do you call big? My application is satellite data btw so the reference could be useful. On Tue, 5 Sep 2006 01:40 pm, Oleg Bartunov wrote: Nick, if you need very fast spatial queries (spherical) you may use our Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access to very big astronomical catalogs. Oleg On Tue, 5 Sep 2006, Nick Bower wrote: We're considering using Postgresql for storing gridded metadata - each point of our grids has a variety of metadata attached to it (including lat/lon, measurements, etc) and would constitute a record in Postgresql+Postgis. Size-wise, grids are about 4000x700 and are collected twice daily over say 10 years. As mentioned, each record would have up to 50 metadata attributes (columns) including geom, floats, varchars etc. So given 4000x700x2x365x10 2 billion, is this going to be a problem if we will be wanting to query on datetimes, Postgis lat/lon, and integer-based metadata flags? If however I'm forced to sub-sample the grid, what rule of thumb should I be looking to be constrained by? Thanks for any pointers, Nick PS - Feel free to throw in any other ideas of grid-suitable databases :) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly