Re: [GENERAL] Two billion records ok?

2006-09-07 Thread Brent Wood

Nick Bower wrote:
We're considering using Postgresql for storing gridded metadata - each point 
of our grids has a variety of metadata attached to it (including lat/lon, 
measurements, etc) and would constitute a record in Postgresql+Postgis.


Size-wise, grids are about 4000x700 and are collected twice daily over say 10 
years.  As mentioned, each record would have up to 50 metadata attributes 
(columns) including geom, floats, varchars etc.


So given 4000x700x2x365x10  2 billion, is this going to  be a problem if we 
will be wanting to query on datetimes, Postgis lat/lon, and integer-based 
metadata flags?
  


H... 2 billion looks optimistic... I get 2.044e+10, which is 20 billion.

I'm currently working with a table of over 200,000,000 records. With a 
clustered index  underlying partitioning, response times are more than 
acceptable (a 25 wide self relation (left outer join) for 3 months data 
(records are around 40/minute) is about 2 minutes. Simple query with a 
where clause on timestamp is a few secs at most. This is on a 32 bit 
Intel system with only 2Gb memory  mirrored 7200RPM SATA hard drives.


I'd suggest partition on timestamp, maybe per year at least,  use a 
clustered index on timestamp. It should be viable if your hardware is up 
to it.


I'd also strongly recommend a suitable platform,  64 bit Linux on AMD 64 
or Opteron with as much memory  the fastest  RAID setup you can afford. 
Make sure you use a fully 64 bit version of Postgres/Postgis on this 
platform as well.


If the same grid is being resampled, then  a separate table defining the 
grid, and a join on grid ID to the main (partitioned) table may improve 
performance ( reduce data volume).


I wouldn't expect instantaneous answers from it, but don't know of any 
reason it won't work. Depends very much on what level of performance is 
considered acceptable.


I'm also using Postgis grids with up to a few million cells and spatial 
joins to a millions or so tracklines to generate gridded models for 
analysis in R. You seem to be scaling this up from where I'm at, so I'd 
like to know how you get on..



Cheers,

  Brent Wood


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Two billion records ok?

2006-09-05 Thread Tim Allen

Nick Bower wrote:

Thanks - but what do you call big?


How many stars do you think there are? :-)


My application is satellite data btw so the reference could be useful.

On Tue, 5 Sep 2006 01:40 pm, Oleg Bartunov wrote:


Nick,

if you need very fast spatial queries (spherical) you may use our
Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access
to very big astronomical catalogs.

Oleg


Tim

--
---
Tim Allen  [EMAIL PROTECTED]
Proximity Pty Ltd  http://www.proximity.com.au/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Two billion records ok?

2006-09-05 Thread Oleg Bartunov

On Tue, 5 Sep 2006, Nick Bower wrote:


Thanks - but what do you call big?


Several billions of stars. You can try our Cone Search service at
http://vo.astronet.ru/cas/conesearch.php

Oleg



My application is satellite data btw so the reference could be useful.

On Tue, 5 Sep 2006 01:40 pm, Oleg Bartunov wrote:

Nick,

if you need very fast spatial queries (spherical) you may use our
Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access
to very big astronomical catalogs.


Oleg

On Tue, 5 Sep 2006, Nick Bower wrote:

We're considering using Postgresql for storing gridded metadata - each
point of our grids has a variety of metadata attached to it (including
lat/lon, measurements, etc) and would constitute a record in
Postgresql+Postgis.

Size-wise, grids are about 4000x700 and are collected twice daily over
say 10 years.  As mentioned, each record would have up to 50 metadata
attributes (columns) including geom, floats, varchars etc.

So given 4000x700x2x365x10  2 billion, is this going to  be a problem if
we will be wanting to query on datetimes, Postgis lat/lon, and
integer-based metadata flags?

If however I'm forced to sub-sample the grid, what rule of thumb should I
be looking to be constrained by?

Thanks for any pointers, Nick

PS - Feel free to throw in any other ideas of grid-suitable databases :)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Two billion records ok?

2006-09-05 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Nick Bower wrote:
 We're considering using Postgresql for storing gridded metadata - each point 
 of our grids has a variety of metadata attached to it (including lat/lon, 
 measurements, etc) and would constitute a record in Postgresql+Postgis.
 
 Size-wise, grids are about 4000x700 and are collected twice daily over say 10 
 years.  As mentioned, each record would have up to 50 metadata attributes 
 (columns) including geom, floats, varchars etc.
 
 So given 4000x700x2x365x10  2 billion, is this going to  be a problem if we 
 will be wanting to query on datetimes, Postgis lat/lon, and integer-based 
 metadata flags?
 
 If however I'm forced to sub-sample the grid, what rule of thumb should I be 
 looking to be constrained by?
 
 Thanks for any pointers, Nick

Tablespaces and table partitioning will be crucial to your needs.
I'm not sure if you can partition indexes, though.

And too bad that compressed bit-map indexes have not been
implemented yet.  For indexes with high key cardinality, they save
a *lot* of space, and queries can run a lot faster.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is common sense really valid?
For example, it is common sense to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that common sense is obviously wrong.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE/Ur7S9HxQb37XmcRAsKLAKDnC36QSzRuaedSsXe+rQp3fbDbOgCfSwlQ
ip2em5mEmXF45kek2rHKJvw=
=uqTK
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] Two billion records ok?

2006-09-04 Thread Nick Bower
We're considering using Postgresql for storing gridded metadata - each point 
of our grids has a variety of metadata attached to it (including lat/lon, 
measurements, etc) and would constitute a record in Postgresql+Postgis.

Size-wise, grids are about 4000x700 and are collected twice daily over say 10 
years.  As mentioned, each record would have up to 50 metadata attributes 
(columns) including geom, floats, varchars etc.

So given 4000x700x2x365x10  2 billion, is this going to  be a problem if we 
will be wanting to query on datetimes, Postgis lat/lon, and integer-based 
metadata flags?

If however I'm forced to sub-sample the grid, what rule of thumb should I be 
looking to be constrained by?

Thanks for any pointers, Nick

PS - Feel free to throw in any other ideas of grid-suitable databases :)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Two billion records ok?

2006-09-04 Thread Michael Fuhr
On Tue, Sep 05, 2006 at 09:26:59AM +0800, Nick Bower wrote:
 So given 4000x700x2x365x10  2 billion, is this going to  be a problem if we 
 will be wanting to query on datetimes, Postgis lat/lon, and integer-based 
 metadata flags?

That figure is about 20 billion, which is indeed  2 billion :-).
If you plan to use integer row IDs then you'll need to use 64-bit
bigint/bigserial instead of 32-bit integer/serial.  I haven't worked
with a database that large; maybe somebody else can give additional
advice.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Two billion records ok?

2006-09-04 Thread Oleg Bartunov

Nick,

if you need very fast spatial queries (spherical) you may use our 
Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access

to very big astronomical catalogs.


Oleg

On Tue, 5 Sep 2006, Nick Bower wrote:


We're considering using Postgresql for storing gridded metadata - each point
of our grids has a variety of metadata attached to it (including lat/lon,
measurements, etc) and would constitute a record in Postgresql+Postgis.

Size-wise, grids are about 4000x700 and are collected twice daily over say 10
years.  As mentioned, each record would have up to 50 metadata attributes
(columns) including geom, floats, varchars etc.

So given 4000x700x2x365x10  2 billion, is this going to  be a problem if we
will be wanting to query on datetimes, Postgis lat/lon, and integer-based
metadata flags?

If however I'm forced to sub-sample the grid, what rule of thumb should I be
looking to be constrained by?

Thanks for any pointers, Nick

PS - Feel free to throw in any other ideas of grid-suitable databases :)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Two billion records ok?

2006-09-04 Thread Nick Bower
Thanks - but what do you call big?

My application is satellite data btw so the reference could be useful.

On Tue, 5 Sep 2006 01:40 pm, Oleg Bartunov wrote:
 Nick,

 if you need very fast spatial queries (spherical) you may use our
 Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access
 to very big astronomical catalogs.


 Oleg

 On Tue, 5 Sep 2006, Nick Bower wrote:
  We're considering using Postgresql for storing gridded metadata - each
  point of our grids has a variety of metadata attached to it (including
  lat/lon, measurements, etc) and would constitute a record in
  Postgresql+Postgis.
 
  Size-wise, grids are about 4000x700 and are collected twice daily over
  say 10 years.  As mentioned, each record would have up to 50 metadata
  attributes (columns) including geom, floats, varchars etc.
 
  So given 4000x700x2x365x10  2 billion, is this going to  be a problem if
  we will be wanting to query on datetimes, Postgis lat/lon, and
  integer-based metadata flags?
 
  If however I'm forced to sub-sample the grid, what rule of thumb should I
  be looking to be constrained by?
 
  Thanks for any pointers, Nick
 
  PS - Feel free to throw in any other ideas of grid-suitable databases :)
 
  ---(end of broadcast)---
  TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

   Regards,
   Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly