Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

Leonardo Francalanci Mon, 04 Oct 2010 13:47:58 -0700

> It sounds like the costing model might need a bit more work before  we commit 
>this.



I tried again the simple sql tests I posted a while ago, and I still get the 
same ratios.
I've tested the applied patch on a dual opteron + disk array Solaris machine.

I really don't get how a laptop hard drive can be faster at reading data using 
random
seeks (required by the original cluster method) than seq scan + sort for the 5M 
rows
test case. 
Same thing for the "cluster vs bloat" test: the seq scan + sort is faster on my 
machine.

I've just noticed that Josh used shared_buffers = 16MB for the "cluster vs 
bloat" test:
I'm using a much higher shared_buffers (I think something like 200MB), since if
you're working with tables this big I thought it could be a more appropriate 
value.
Maybe that's the thing that makes the difference???

Can someone else test the patch?

And: I don't have that deep knowledge of how postgresql deletes rows; but I 
thought
that something like:

DELETE FROM mybloat WHERE RANDOM() < 0.9;

would only delete data, not indexes; so the patch should perform even better in 
this
case (as it does, in fact, on my test machine), as:

- the original cluster method would read the whole index, and fetch only the 
"still alive"
rows
- the new method would read the table using a seq scan, and sort in memory the 
few
rows still alive

But, as I said, maybe I'm getting this part wrong...





-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

Reply via email to