Re: [HACKERS] [GENERAL] Large databases, performance

Manfred Koizar Mon, 07 Oct 2002 06:42:57 -0700

On Mon, 07 Oct 2002 15:07:29 +0530, "Shridhar Daithankar"
<[EMAIL PROTECTED]> wrote:
>Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All 
>numbers include indexes. This is really going to be a problem when things are 
>deployed. Any idea how can it be taken down?


Shridhar,

if i'm not mistaken, a char(n)/varchar(n) column is stored as a 32-bit
integer specifying the length followed by as many characters as the
length tells.  On 32-bit Intel hardware this structure is aligned on a
4-byte boundary.

For your row layout this gives the following sizes (look at the "phys
size" column):

| Field    Field     Null Indexed phys  mini
| Name     Type                   size      
|--------------------------------------------
| type     int        no    no       4     4
| esn      char (10)  no    yes     16    11
| min      char (10)  no    yes     16    11
| datetime timestamp  no    yes      8     8
| opc0     char (3)   no    no       8     4
| opc1     char (3)   no    no       8     4
| opc2     char (3)   no    no       8     4
| dpc0     char (3)   no    no       8     4
| dpc1     char (3)   no    no       8     4
| dpc2     char (3)   no    no       8     4
| npa      char (3)   no    no       8     4
| nxx      char (3)   no    no       8     4
| rest     char (4)   no    no       8     5
| field0   int        yes   no       4     4
| field1   char (4)   yes   no       8     5
| field2   int        yes   no       4     4
| field3   char (4)   yes   no       8     5
| field4   int        yes   no       4     4
| field5   char (4)   yes   no       8     5
| field6   int        yes   no       4     4
| field7   char (4)   yes   no       8     5
| field8   int        yes   no       4     4
| field9   char (4)   yes   no       8     5
|                                 ----- -----
|                                  176   116

Ignoring nulls for now, you have to add 32 bytes for a v7.2 heap tuple
header and 4 bytes for ItemIdData per tuple, ending up with 212 bytes
per tuple or ca. 85 GB heap space for 432000000 tuples.  Depending on
fill factor similar calculations give some 30 GB for your index.

Now if we had a datatype with only one byte for the string length,
char columns could be byte aligned and we'd have column sizes given
under "mini" in the table above.  The columns would have to be
rearranged according to alignment requirements.

Thus 60 bytes per heap tuple and 8 bytes per index tuple could be
saved, resulting in a database size of ~ 85 GB (index included).  And
I bet this would be significantly faster, too.

Hackers, do you think it's possible to hack together a quick and dirty
patch, so that string length is represented by one byte?  IOW can a
database be built that doesn't contain any char/varchar/text value
longer than 255 characters in the catalog?

If I'm not told that this is impossibly, I'd give it a try.  Shridhar,
if such a patch can be made available, would you be willing to test
it?

What can you do right now?  Try using v7.3 beta and creating your
table WITHOUT OIDS.  This saves 8 bytes per tuple; not much, but
better save 4% than nothing.

Servus
 Manfred

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] [GENERAL] Large databases, performance

Reply via email to