Re: [HACKERS] Fixed length data types issue

Mark Dilger Thu, 14 Sep 2006 09:43:07 -0700

My apologies if you are seeing this twice. I posted it last night, butit still does not appear to have made it to the group.


Mark Dilger wrote:

Tom Lane wrote:
Mark Dilger <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
Please provide a stack trace --- AFAIK there shouldn't be any reasonwhy
a pass-by-ref 3-byte type wouldn't work.
(gdb) bt
#0  0xb7e01d45 in memcpy () from /lib/libc.so.6
#1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7,values=0x83c2e84, isnull=0x83c2e98 "", data=0x83c2ef4 "",infomask=0x83c2ef0, bit=0x0)
     at heaptuple.c:181
Hm, are you sure you provided a valid pointer (not the integer value
itself) as the Datum output from int3_in?

(Looks at patch ... ) Um, I think you didn't, although that coding
is far too cute to be actually readable ...

            regards, tom lane
Ok, I have it working on my intel architecture machine. Here are someof my findings. Disk usage is calculated by running 'du -b' in/usr/local/pgsql/data before and after loading the table, and taking thedifference. That directory is deleted, recreated, and initdb rerunbetween each test. The host system is a dual processor, dual core 2.4GHz system, 2 GB DDR400 memory, 10,000 RPM SCSI ultra160 hard drive withthe default postgresql.conf file as created by initdb. The code is thestock postgresql-8.1.4 release tarball compiled with gcc and configuredwithout debug or cassert options enabled.
INT3 VS INT4
------------
Using a table of 8 integers per row and 16777216 rows, I can drop thedisk usage from 1.2 GB down to 1.0 GB by defining those integers as int3rather than int4. (It works out to about 70.5 bytes per row vs. 62.5bytes per row.) However, the load time actually increases, probably dueto CPU/memory usage. The time increased from 197 seconds to 213seconds. Note that int3 is defined pass-by-reference due to alimitation in the code that prevents pass-by-value for any datasizeother than 1, 2, or 4 bytes.
Using a table of only one integer per row, the table size is exactly thesame (down to the byte) whether I use int3 or int4. I suspect this isdue to data alignment for the row being on at least a 4 byte boundary.
Creating an index on a single column of the 8-integer-per-row table, theindex size is exactly the same whether the integers are int3 or int4.Once again, I suspect that data alignment is eliminating the space savings.
I haven't tested this, but I suspect that if the column following anint3 is aligned on 4 or 8 byte boundaries, that the int3 column willhave an extra byte padded and hence will have no performance gain.
INT1 VS INT2
------------
Once again using a table of 8 integers per row and 16777216 rows, I candrop the disk usage from 909 MB down to 774 MB by defining thoseintegers as int1 rather than int2. (54 bytes per row vs 46 bytes perrow.) The load time also drops, from 179 seconds to 159 seconds. Notethat int1 is defined pass-by-value.
mark



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [HACKERS] Fixed length data types issue

Reply via email to