Dan Armbrust wrote:


What, ALWAYS faster, even for the first FK check when there's only one row in the target table and that's cached?

If you're really in a hurry doing your bulk loads:
 1. Use COPY.
 2. Drop/restore the foreign-key constraints before/after.
That will be hugely faster than INSERTs, although it's not always an applicable solution.
--
  Richard Huxton
  Archonet Ltd

It seems like the query planner goes to great lengths to avoid using indexes because it might take 5 ms longer to execute an index lookup on a table with one row. But then, when the table has 1 million rows, and a full scan takes 3 minutes, and the index scan takes 3 seconds, it has no problem picking the 3 minute route. I'll gladly give up the 5 ms in turn for not having to wait 3 minutes, which is why I disabled the sequential scans. If I have a small table, where indexes won't speed things up, I wont build an index on it.

The other factor, is that most of my tables have at least thousands, and usually millions of rows. Sequential scans will never be faster for the queries that I am doing - like I said, that is why I created the indexes.

The issue is nothing to do with special "small table" handling code. It's all to do with not having up-to-date stats. Of course, once you've analysed your table the system knows your index is good.

My loading is done programatically, from another format, so COPY is not an option.

Why not? A lot of my bulk-loads are generated from other systems and I go through a temporary-file/pipe via COPY when I can. When I don't I block inserts into groups of e.g. 1000 and stick in an analyse/etc as required.

> Neither is removing foreign keys, as they are required to
guarantee valid data.

Ah, but you can still guarantee your data. You can wrap the whole drop-FK, bulk-load, recreate-FK in a single transaction, and it can still be faster. Obviously doing this on a high-activity table won't win though, you'll have to block everyone else doing updates.

I don't really have a problem with the insert speed when it is working properly - it is on par with other DBs that I have on the same hardware. The problem is when it stops using the indexes, for no good reason.

Example, last night, I kicked off a load process - this morning, it had only managed to make it through about 600,000 rows (split across several tables). After restarting it this morning, it made it through the same data in 30 minutes.
If thats not bad and buggy behavior, I don't know what is....

So run ANALYSE in parallel with your load, or break the bulk-load into blocks and analyse in-line. I'm not sure ripping out PG's cost-based query analyser will be a popular solution just to address bulk-loads.

--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to