My apologies if you are seeing this twice. I posted it last night, but it still does not appear to have made it to the group.

Mark Dilger wrote:
Tom Lane wrote:
Mark Dilger <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
Please provide a stack trace --- AFAIK there shouldn't be any reason why
a pass-by-ref 3-byte type wouldn't work.

(gdb) bt
#0  0xb7e01d45 in memcpy () from /lib/libc.so.6
#1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, values=0x83c2e84, isnull=0x83c2e98 "", data=0x83c2ef4 "", infomask=0x83c2ef0, bit=0x0)
     at heaptuple.c:181

Hm, are you sure you provided a valid pointer (not the integer value
itself) as the Datum output from int3_in?

(Looks at patch ... ) Um, I think you didn't, although that coding
is far too cute to be actually readable ...

            regards, tom lane

Ok, I have it working on my intel architecture machine. Here are some of my findings. Disk usage is calculated by running 'du -b' in /usr/local/pgsql/data before and after loading the table, and taking the difference. That directory is deleted, recreated, and initdb rerun between each test. The host system is a dual processor, dual core 2.4 GHz system, 2 GB DDR400 memory, 10,000 RPM SCSI ultra160 hard drive with the default postgresql.conf file as created by initdb. The code is the stock postgresql-8.1.4 release tarball compiled with gcc and configured without debug or cassert options enabled.


INT3 VS INT4
------------
Using a table of 8 integers per row and 16777216 rows, I can drop the disk usage from 1.2 GB down to 1.0 GB by defining those integers as int3 rather than int4. (It works out to about 70.5 bytes per row vs. 62.5 bytes per row.) However, the load time actually increases, probably due to CPU/memory usage. The time increased from 197 seconds to 213 seconds. Note that int3 is defined pass-by-reference due to a limitation in the code that prevents pass-by-value for any datasize other than 1, 2, or 4 bytes.

Using a table of only one integer per row, the table size is exactly the same (down to the byte) whether I use int3 or int4. I suspect this is due to data alignment for the row being on at least a 4 byte boundary.

Creating an index on a single column of the 8-integer-per-row table, the index size is exactly the same whether the integers are int3 or int4. Once again, I suspect that data alignment is eliminating the space savings.

I haven't tested this, but I suspect that if the column following an int3 is aligned on 4 or 8 byte boundaries, that the int3 column will have an extra byte padded and hence will have no performance gain.


INT1 VS INT2
------------
Once again using a table of 8 integers per row and 16777216 rows, I can drop the disk usage from 909 MB down to 774 MB by defining those integers as int1 rather than int2. (54 bytes per row vs 46 bytes per row.) The load time also drops, from 179 seconds to 159 seconds. Note that int1 is defined pass-by-value.


mark


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Reply via email to