On 07/04/11 17:57, Paul & Caroline Lewis wrote:

DB SIZE PROBLEM:
I have run the following tests on a Postgresql 9 with postgis 1.5
platform and am getting significant table and index size differences.
TestSet1 is run with a file tunnel6.asc, a CSV file with the following
being a sample of the data:
-6.34223029,53.39211958,132.586
The file is 6 GB in size with 70 million rows. After running TestSet1
the table has the correct number of rows (70 mill), random viewing of
the data and it seems fine, while the table size is 16 GB and the index
size is 6195 MB.

I now Drop Cascade the tunnel6 table for TestSet2

For TestSet2 it is run on a preprocessed version of tunnel6.asc, now
called tunnel6_py.asc, with the following being a sample of the data:
-6.34223029,53.39211958,132.586,SRID=4326;POINT(-6.34223029 53.39211958
132.586)
This file grows to 8 GB and still has 70 million rows but after
following TestSet2 steps while the table still has the correct number of
rows (70 mill), random viewing of the data and it still seems fine but
now the table size is 9.5 GB and the index size is 3363 MB.

Have I done something significantly wrong in these tests?
The TestSet2 process loads the data about 10 minutes quicker than
TestSet1 so I would like to use it but I don't trust it now given the
significant differences in table sizes.

Hi Paul,

Due to the way in which the GiST indexes work, the order in which data is fed into the index build function can have a noticeable effect on the size and performance of the resulting index. This is because depending upon the order, page splits can occur at different points in the build process and hence influence the overall structure of the index.

There has been some talk of optimising the split function to create better indexes, but no-one has really spent much time on it. That said, it appears that whatever you do in your pre-processing does have a noticeable performance benefit so if you can provide us with information as to how this pre-processing affects the order of the data then we may be able to make use of this information.


ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to