Dan Armbrust wrote:
What, ALWAYS faster, even for the first FK check when there's only one
row in the target table and that's cached?
If you're really in a hurry doing your bulk loads:
1. Use COPY.
2. Drop/restore the foreign-key constraints before/after.
That will be hugely faster than INSERTs, although it's not always an
applicable solution.
--
Richard Huxton
Archonet Ltd
It seems like the query planner goes to great lengths to avoid using
indexes because it might take 5 ms longer to execute an index lookup on
a table with one row.
But then, when the table has 1 million rows, and a full scan takes 3
minutes, and the index scan takes 3 seconds, it has no problem picking
the 3 minute route.
I'll gladly give up the 5 ms in turn for not having to wait 3 minutes,
which is why I disabled the sequential scans. If I have a small table,
where indexes won't speed things up, I wont build an index on it.
The other factor, is that most of my tables have at least thousands, and
usually millions of rows. Sequential scans will never be faster for the
queries that I am doing - like I said, that is why I created the indexes.
The issue is nothing to do with special "small table" handling code.
It's all to do with not having up-to-date stats. Of course, once you've
analysed your table the system knows your index is good.
My loading is done programatically, from another format, so COPY is not
an option.
Why not? A lot of my bulk-loads are generated from other systems and I
go through a temporary-file/pipe via COPY when I can. When I don't I
block inserts into groups of e.g. 1000 and stick in an analyse/etc as
required.
> Neither is removing foreign keys, as they are required to
guarantee valid data.
Ah, but you can still guarantee your data. You can wrap the whole
drop-FK, bulk-load, recreate-FK in a single transaction, and it can
still be faster. Obviously doing this on a high-activity table won't win
though, you'll have to block everyone else doing updates.
I don't really have a problem with the insert
speed when it is working properly - it is on par with other DBs that I
have on the same hardware. The problem is when it stops using the
indexes, for no good reason.
Example, last night, I kicked off a load process - this morning, it had
only managed to make it through about 600,000 rows (split across several
tables). After restarting it this morning, it made it through the same
data in 30 minutes.
If thats not bad and buggy behavior, I don't know what is....
So run ANALYSE in parallel with your load, or break the bulk-load into
blocks and analyse in-line. I'm not sure ripping out PG's cost-based
query analyser will be a popular solution just to address bulk-loads.
--
Richard Huxton
Archonet Ltd
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match