On Wed, Feb 18, 2009 at 12:39:50PM -0800, Mike Christensen wrote: > I'm doing some perf testing and need huge amounts of data. So I have a > program that is adding data to a few tables ranging from 500,000 to 15M > rows.
I assume you're repeatedly inserting data and then deleting it? If so, PG won't get much of a chance to clean up after you. Because of the way it handles transactions all of the old data will be left in the table until the table is vacuumed and the appropriate tuples/rows are marked as deleted. > The program is just a simply C# program that blasts data into the > DB, Just out of interest, do you know about the COPY command? things will go much faster than a large number of INSERT statements. > but after about 3M rows or so I get an errror: > > ERROR: could not extend relation 1663/41130/41177: No space left on device > HINT: Check free disk space. > > If I do a full VACUUM on the table being inserted into, the error goes > away but it comes back very quickly. Obviously, I wouldn't want this > happening in a production environment. VACUUM FULL's should very rarely be done, routine maintenance would be to do plain VACUUMs or let the auto-vacuum daemon handle things. This will mark the space as available and subsequent operations will reuse the space. > What's the recommended setup in a production environment for tables > where tons of data will be inserted? If you're repeatedly inserting and deleting data then you'll probably want to intersperse some VACUUMs in there. > It seems to me there's some sort of "max table size" before you have to > allocate more space on the disk, however I can't seem to find where > these settings are and how to allow millions of rows to be inserted into > a table without having to vacuum every few million rows.. There's no maximum table size you get control over; 15million rows on its own isn't considered particularly big but you need to start being careful at that stage. If you've got a particularly "wide" table (i.e. lots of attributes/columns) this is obviously going to take more space and you may consider normalizing the data out into separate tables. Once your row count gets to 10 or 100 times what your dealing with you'd probably need to start thinking about partitioning the tables and how to do that would depend on your usage patterns. -- Sam http://samason.me.uk/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general