Re: [GENERAL] Help estimating database and WAL size

2012-10-19 Thread Frank Lanitz

Am 2012-10-15 23:13, schrieb John R Pierce:

On 10/15/12 2:03 PM, Daniel Serodio (lists) wrote:

John R Pierce wrote:

On 10/08/12 1:39 PM, Daniel Serodio (lists) wrote:

3) Estimate the size of the transaction log
** We've got no idea how to estimate this, need advice **


postgres doesn't have a 'transaction log', it has the WAL 
(Write-Ahead Logs).  These are typically 16MB each.  on databases 
with a really heavy write load, I might bump the checkpoint_segments 
as high as 60, which seems to result in about 120 of them being 
created, 2GB total.  these files get reused, unless you are archiving 
them to implement a continuous realtime backup system (which enables 
"PITR", Point in Time Recovery)
Thanks, I was using the term "transaction log" as a synonym for WAL. 
We're planning on enabling PITR; how can we calculate the WAL size and 
the WAL archive size in this case?



its based on how much data you're writing to the database.   Wheen
you write tuples (rows) to the database, they are stored in 8K
pages/blocks which are written to the current WAL file as they are
committed, when that WAL file fills up, or the checkpoint_timeout is
reached (the default is 30 seconds, I believe) , the WAL file is
written to the archive.

To be able to utilize PITR, you need a complete base backup of the
file system, and /all/ the archived WAL files since that base backup
was taken.


In huge number of cases you will also write these files to some kind of 
network storage via e.g. CIFS or NFS so you have access to them via your 
warm-standby-machines. I want to say: this is taken some storage but can 
be reviewed kind of independent from database itself.


Cheers,
Frank





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help estimating database and WAL size

2012-10-19 Thread Jasen Betts
On 2012-10-15, Daniel Serodio (lists)  wrote:
>>
>> OID is optional, IIRC PGXID is not
> I hadn't heard of PGXID, I've just searched Google but found no 
> reference to this term except for this e-mail thread and some source 
> code. What is PGXID? Where can I learn more about hit?

That was the wrong name there's some sort of ID associated with the row 
it's part of the fixed per-row overhead, it may vary between different
versions of the row.

-- 
⚂⚃ 100% natural



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help estimating database and WAL size

2012-10-15 Thread John R Pierce

On 10/15/12 2:03 PM, Daniel Serodio (lists) wrote:

John R Pierce wrote:

On 10/08/12 1:39 PM, Daniel Serodio (lists) wrote:

3) Estimate the size of the transaction log
** We've got no idea how to estimate this, need advice ** 


postgres doesn't have a 'transaction log', it has the WAL 
(Write-Ahead Logs).  These are typically 16MB each.  on databases 
with a really heavy write load, I might bump the checkpoint_segments 
as high as 60, which seems to result in about 120 of them being 
created, 2GB total.  these files get reused, unless you are archiving 
them to implement a continuous realtime backup system (which enables 
"PITR", Point in Time Recovery)
Thanks, I was using the term "transaction log" as a synonym for WAL. 
We're planning on enabling PITR; how can we calculate the WAL size and 
the WAL archive size in this case? 



its based on how much data you're writing to the database.   Wheen you 
write tuples (rows) to the database, they are stored in 8K pages/blocks 
which are written to the current WAL file as they are committed, when 
that WAL file fills up, or the checkpoint_timeout is reached (the 
default is 30 seconds, I believe) , the WAL file is written to the archive.


To be able to utilize PITR, you need a complete base backup of the file 
system, and /all/ the archived WAL files since that base backup was taken.



--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help estimating database and WAL size

2012-10-15 Thread Daniel Serodio (lists)

John R Pierce wrote:

On 10/08/12 1:39 PM, Daniel Serodio (lists) wrote:

3) Estimate the size of the transaction log
** We've got no idea how to estimate this, need advice ** 


postgres doesn't have a 'transaction log', it has the WAL (Write-Ahead 
Logs).  These are typically 16MB each.  on databases with a really 
heavy write load, I might bump the checkpoint_segments as high as 60, 
which seems to result in about 120 of them being created, 2GB total.  
these files get reused, unless you are archiving them to implement a 
continuous realtime backup system (which enables "PITR", Point in Time 
Recovery)
Thanks, I was using the term "transaction log" as a synonym for WAL. 
We're planning on enabling PITR; how can we calculate the WAL size and 
the WAL archive size in this case?


Regards,
Daniel Serodio


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help estimating database and WAL size

2012-10-15 Thread Daniel Serodio (lists)

Jasen Betts wrote:

On 2012-10-08, Daniel Serodio (lists)  wrote:

We are preparing a PostgreSQL database for production usage and we need
to estimate the storage size for this database. We're a team of
developers with low expertise on database administration, so we are
doing research, reading manuals and using our general IT knowledge to
achieve this.

We have actual data to migrate to this database and some rough
estimations of growth. For the sake of the example, let's say we have a
estimation of growth of 50% per year.

The point is: what's the general proper technique for doing a good size
estimation?

We are estimating the storage usage by the following rules. Topics where
we need advice are marked with ** asterisks **. Feedback on the whole
process is more than welcome.

1) Estimate the size of each table
  1.1) Discover the actual size of each row.
  - For fields with a fixed size (like bigint, char, etc) we used
the sizes described in the documentation
  - For fields with a dynamic size (like text) we estimated the
string length and used the function select pg_column_size('expected text
here'::text)


long text is subject to compression, pg_column_size doesn't seem to
test compression, compression is some sort of LZ..
Interesting, I didn't know about automatic compression. I've just read 
the section on TOAST and haven't been able to answer this either: Is 
there any way to check for the compressed size?

  - We added 4 more bytes for the OID that PostgreSQL uses internally


OID is optional, IIRC PGXID is not
I hadn't heard of PGXID, I've just searched Google but found no 
reference to this term except for this e-mail thread and some source 
code. What is PGXID? Where can I learn more about hit?

  1.2) Multiply the size of each row by the number of estimated rows
  ** Do I need to consider any overhead here, like row or table
metadata? **


page size 8K
column overhead 1 byte per not-NULL column, NULLs are free,


2) Estimate the size of each table index
  ** Don't know how to estimate this, need advice here **


IIRC
( data being indexed + 8 bytes ) / fill factor


3) Estimate the size of the transaction log
  ** We've got no idea how to estimate this, need advice **


how big are your transactions?

Very short, a couple of statements each.

4) Estimate the size of the backups (full and incremental)
  ** Don't know how to estimate this, need advice here **


depends on the format you use, backups tend to compress well.


5) Sum all the estimates for the actual minimum size


no, you get estimated size.

Thanks a lot for the response.

Regards,
Daniel Serodio

6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates
1, 2 and 4 for the minimum size after 1 year

7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5
and 6 for a good safety margin

I know the rules got pretty extensive, please let me know if you need
more data or examples for a better understanding.

We've also posted this question to
http://dba.stackexchange.com/q/25617/10166


Re: [GENERAL] Help estimating database and WAL size

2012-10-12 Thread John R Pierce

On 10/08/12 1:39 PM, Daniel Serodio (lists) wrote:

3) Estimate the size of the transaction log
** We've got no idea how to estimate this, need advice ** 


postgres doesn't have a 'transaction log', it has the WAL (Write-Ahead 
Logs).  These are typically 16MB each.  on databases with a really heavy 
write load, I might bump the checkpoint_segments as high as 60, which 
seems to result in about 120 of them being created, 2GB total.  these 
files get reused, unless you are archiving them to implement a 
continuous realtime backup system (which enables "PITR", Point in Time 
Recovery)




--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help estimating database and WAL size

2012-10-12 Thread Jasen Betts
On 2012-10-08, Daniel Serodio (lists)  wrote:
> We are preparing a PostgreSQL database for production usage and we need 
> to estimate the storage size for this database. We're a team of 
> developers with low expertise on database administration, so we are 
> doing research, reading manuals and using our general IT knowledge to 
> achieve this.
>
> We have actual data to migrate to this database and some rough 
> estimations of growth. For the sake of the example, let's say we have a 
> estimation of growth of 50% per year.
>
> The point is: what's the general proper technique for doing a good size 
> estimation?
>
> We are estimating the storage usage by the following rules. Topics where 
> we need advice are marked with ** asterisks **. Feedback on the whole 
> process is more than welcome.
>
> 1) Estimate the size of each table
>  1.1) Discover the actual size of each row.
>  - For fields with a fixed size (like bigint, char, etc) we used 
> the sizes described in the documentation
>  - For fields with a dynamic size (like text) we estimated the 
> string length and used the function select pg_column_size('expected text 
> here'::text)

long text is subject to compression, pg_column_size doesn't seem to 
test compression, compression is some sort of LZ..

>  - We added 4 more bytes for the OID that PostgreSQL uses internally

OID is optional, IIRC PGXID is not 

>  1.2) Multiply the size of each row by the number of estimated rows
>  ** Do I need to consider any overhead here, like row or table 
> metadata? **

page size 8K
column overhead 1 byte per not-NULL column, NULLs are free,

> 2) Estimate the size of each table index
>  ** Don't know how to estimate this, need advice here **

IIRC
( data being indexed + 8 bytes ) / fill factor


> 3) Estimate the size of the transaction log
>  ** We've got no idea how to estimate this, need advice **

how big are your transactions?

> 4) Estimate the size of the backups (full and incremental)
>  ** Don't know how to estimate this, need advice here **

depends on the format you use, backups tend to compress well.

> 5) Sum all the estimates for the actual minimum size

no, you get estimated size.

> 6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates 
> 1, 2 and 4 for the minimum size after 1 year
>
> 7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5 
> and 6 for a good safety margin
>
> I know the rules got pretty extensive, please let me know if you need 
> more data or examples for a better understanding.
>
> We've also posted this question to 
> http://dba.stackexchange.com/q/25617/10166
>
> Thanks in advance,
> Daniel Serodio
>
>


-- 
⚂⚃ 100% natural



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help estimating database and WAL size

2012-10-08 Thread Gavin Flower

On 09/10/12 09:39, Daniel Serodio (lists) wrote:
We are preparing a PostgreSQL database for production usage and we 
need to estimate the storage size for this database. We're a team of 
developers with low expertise on database administration, so we are 
doing research, reading manuals and using our general IT knowledge to 
achieve this.


We have actual data to migrate to this database and some rough 
estimations of growth. For the sake of the example, let's say we have 
a estimation of growth of 50% per year.


The point is: what's the general proper technique for doing a good 
size estimation?


We are estimating the storage usage by the following rules. Topics 
where we need advice are marked with ** asterisks **. Feedback on the 
whole process is more than welcome.


1) Estimate the size of each table
1.1) Discover the actual size of each row.
- For fields with a fixed size (like bigint, char, etc) we 
used the sizes described in the documentation
- For fields with a dynamic size (like text) we estimated the 
string length and used the function select pg_column_size('expected 
text here'::text)
- We added 4 more bytes for the OID that PostgreSQL uses 
internally

1.2) Multiply the size of each row by the number of estimated rows
** Do I need to consider any overhead here, like row or table 
metadata? **


2) Estimate the size of each table index
** Don't know how to estimate this, need advice here **

3) Estimate the size of the transaction log
** We've got no idea how to estimate this, need advice **

4) Estimate the size of the backups (full and incremental)
** Don't know how to estimate this, need advice here **

5) Sum all the estimates for the actual minimum size

6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates 
1, 2 and 4 for the minimum size after 1 year


7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 
5 and 6 for a good safety margin


I know the rules got pretty extensive, please let me know if you need 
more data or examples for a better understanding.


We've also posted this question to 
http://dba.stackexchange.com/q/25617/10166


Thanks in advance,
Daniel Serodio



You also have to allow for table & index bloat.

When a record is DELETEd or UPDATEd, the space used on the disk is not 
automatically reclaimed. So in a very volatile database, the size of the 
data files could be several times bigger than the actual data storage 
requires. There are automatic and manual procedures (look up VACUUM) 
that can keep this under control. However, you will still need to 
account for bloat. The extent of bloat depends very much on your usage 
patterns.



Cheers,
Gavin



[GENERAL] Help estimating database and WAL size

2012-10-08 Thread Daniel Serodio (lists)
We are preparing a PostgreSQL database for production usage and we need 
to estimate the storage size for this database. We're a team of 
developers with low expertise on database administration, so we are 
doing research, reading manuals and using our general IT knowledge to 
achieve this.


We have actual data to migrate to this database and some rough 
estimations of growth. For the sake of the example, let's say we have a 
estimation of growth of 50% per year.


The point is: what's the general proper technique for doing a good size 
estimation?


We are estimating the storage usage by the following rules. Topics where 
we need advice are marked with ** asterisks **. Feedback on the whole 
process is more than welcome.


1) Estimate the size of each table
1.1) Discover the actual size of each row.
- For fields with a fixed size (like bigint, char, etc) we used 
the sizes described in the documentation
- For fields with a dynamic size (like text) we estimated the 
string length and used the function select pg_column_size('expected text 
here'::text)

- We added 4 more bytes for the OID that PostgreSQL uses internally
1.2) Multiply the size of each row by the number of estimated rows
** Do I need to consider any overhead here, like row or table 
metadata? **


2) Estimate the size of each table index
** Don't know how to estimate this, need advice here **

3) Estimate the size of the transaction log
** We've got no idea how to estimate this, need advice **

4) Estimate the size of the backups (full and incremental)
** Don't know how to estimate this, need advice here **

5) Sum all the estimates for the actual minimum size

6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates 
1, 2 and 4 for the minimum size after 1 year


7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5 
and 6 for a good safety margin


I know the rules got pretty extensive, please let me know if you need 
more data or examples for a better understanding.


We've also posted this question to 
http://dba.stackexchange.com/q/25617/10166


Thanks in advance,
Daniel Serodio


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general