> It's
> easy to have
> estimates that are off by a factor of two or three, though,
> so I think
> you'd frequently have situations when the query completed
> when the
> progress estimater was at 40% or 250%. 

I thought about implementing a "given perfect estimates" indicator at first 
then, as a second step, using histograms to leverage the indicator precision at 
run time. Of course, this doesn't mean the user wouldn't see the query 
completed at 40% or "slowing down" in a lot of cases...

I started this patch after reading the papers in 
Apparently they were able to predict query execution remaining time (in case of 
a "perfect estimates" query) with a very simple algorithm.

Given that:
1) The algorithm ("driver node hypothesis") is so easy 
2) My project fits in the category of "perfect estimates" queries

I thought "I'll give it a try".

Well: I have no idea how they got their results.

IMHO it's not possible to get max 10% error on query remaing time on most of 
the tpcd queries using that method, since the "driver nodes" have all the same 
"importance". I had to introduce a lot of complexity (not in the patch that I 
posted) to have it "somehow" working, giving the nodes different work per tuple 
according to the node type (example: in a loop join the time it takes to read a 
row of the outer relation can't be compared to, say, the time it takes to read 
a row from a table scan: but the driver node hypothesis says they will take the 
same time...).

So the code that I have right now works "pretty well" for the 10 queries of my 
project, but I guess won't work for general queries :(

> So, I'm all in favor of what you're trying to conceptually;
> I just
> don't like your proposed implementation.

What kind of implementation would you propose?

Thank you very much for your comments.

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to