Hi,

I have written some Java code which builds a postgresql function.  That 
function calls approximately 6 INSERT statements with a RETURNING clause.  I 
recreate and re-run the function about 900,000 times.  I use JDBC to execute 
these functions on postgresql 8.3 on Windows.  When I tried running this on a 
single Connection of Postgresql, it failed (some kind of memory error).  So I 
split the JDBC connections up into chunks of 5000.  I reran and everything was 
fine.  It took about 1 hour to execute all the updates.

Since it took so long to perform the update, I wanted to prevent other users 
from querying the data during that time.  So I read about the LOCK command.  It 
seemed like I should LOCK all the tables in the database with an ACCESS 
EXCLUSIVE mode.  That would prevent anyone from getting data while the database 
was making its updates.

Since a LOCK is only valid for 1 transaction, I set autocommit to FALSE.  I 
also removed the code which chunked up the inserts.  I had read that a single 
transaction ought to have better performance than committing after each insert, 
but that was clearly not what ended up happening in my case.

In my case, a few problems occurred.  Number 1, the process ran at least 8 
hours and never finished.  It did not finish because the hard drive was filled 
up.  After running a manual vacuum (VACUUM FULL), no space was freed up.  I 
think this has cost me 20 GB of space.  Is there any way to free this space 
up?  I even dropped the database to no avail.

Secondly, why did this process take over 8 hours to run?  While reading the 
performance mailing list, it seems like recommendations are to run lots of 
INSERTS in a single commit.  Is 5 million too many?  Is redefining a function 
over and over inside a transaction a problem?  Does the RETURNING clause 
present a problem during a single transaction?

If anyone has any suggestions for me, I would really appreciate it.

Tina

Reply via email to