Ive got a scenario where I've got a 2 million row table.  Data from inbound 
files gets processed into it.  
A new file might have 10 resulting rows in this table...might have 
40K...depends on the source, day of month, etc.

I've got a process that parses the file and loads the records into the 
table...giving it a unique file_id for the overall load and places that value 
on each record.
Another process will perform a series of queries...joining against that table 
(for only records with that file_id).  
The problem is that the planner has no idea how many records might exist for 
that file_id.  
If I throw a file_id at the planner that is not in the most common value list, 
it picks a nice number like 384 as it's row count estimate.  
So when I am referencing a new file_id (that obviously isn't IN the most common 
value list as yet..regardless of how many 
records I just loaded because I haven't run analyze yet),  the planner 
dutifully estimates that I will get only 384 rows.  
For large files, this is off by 2 (or god forbid, 3) orders of magnitude.  
That yields very bad overall plans (regardless of the fact that I have indexes 
on the file_id column)

It seems like I am in a no-win situation.  The query I am executing is fairly 
complex...and when the planner is off by multiple orders of magnitude on a 
rowcount, it goes way off the tracks in terms of planning.

I COULD do an analyze after loading the file...but there is no guarantee that 
the file I just loaded will end up in the most common value list....and I end 
up with bad plan.

Any thoughts?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to