I have done the following test and I am unable to understand the results. I have tried debugging the code and I have reached down to the Storage Layer. I am playing with the optimizer etc.. I no very little about the internals of the Executor.
If you could point out to me what possible explanation for such anomaly I would be very glad. Thanks, Makarona My Test: Setup: ------- I have created two very similar tables mycorr_10 and mycorr_100, attribute names are {key,a,b} for both tables. I added 16 M rows in both tables in the following fashion: I gave a random value to each attribute key ( dont care ) Values in a,b take a random value from [1-16M] In the case of mycorr_10 I set a random 10% of the a=b In the case of mycorr_100 I set all a=b I create index{a,b} on both tables I VACUUM ANALYZE p.s. I am trying to simulate an optimizer cardinality estimation error due to Independence assumption. Query : SELECT count(key) FROM mycorr_10 -- (or mycorr_100) WHERE a>15900000 and b>15900000; Explain: ---------- As expected using the independence assumption the Planner chooses to use the index for both tables cases: Aggregate([4130.82][4130.83][1][94083.95][94083.96][1] width=4) -> Bitmap Heap Scan on mycorr_100([1997.92][4129.41][566][2021.57][93846.00][95177] width=4) Recheck Cond: ((a > 15900000) AND (b > 15900000)) -> Bitmap Index Scan on ab_100([0.00][1997.77][566][0.00][1997.77][95177] width=0) Index Cond: ((a > 15900000) AND (b > 15900000)) (5 rows) p.s. Explain output may seem weird as i have changes it a bit. Explain Analyze --------------------- restart postgres echo 1 > /proc/sys/vm/drop_caches (drop file system caches) explain analyze select count(key) from mycorr_10 where a>15900000 and b>15900000; restart postgres echo 1 > /proc/sys/vm/drop_caches explain analyze select count(key) from mycorr_100 where a>15900000 and b>15900000; Result for mycorr_100: --------------------------- Aggregate([4130.82][4130.83][1][94083.95][94083.96][1] width=4) (actual time=11424.077..11424.078 rows=1 loops=1) -> Bitmap Heap Scan on mycorr_100([1997.92][4129.41][566][2021.57][93846.00][95177] width=4) (actual time=167.979..11304.413 rows=100000 loops=1) Recheck Cond: ((a > 15900000) AND (b > 15900000)) -> Bitmap Index Scan on ab_100([0.00][1997.77][566][0.00][1997.77][95177] width=0) (actual time=120.127..120.127 rows=100000 loops=1) Index Cond: ((a > 15900000) AND (b > 15900000)) Total runtime: 11426.329 ms (6 rows) Result for mycorr_10: --------------------------- Aggregate([4608.36][4608.37][1][94197.91][94197.92][1] width=4) (actual time=24393.058..24393.058 rows=1 loops=1) -> Bitmap Heap Scan on mycorr_10([2249.51][4606.79][629][2272.83][93963.14][93908] width=4) (actual time=108.219..24374.050 rows=10563 loops=1) Recheck Cond: ((a > 15900000) AND (b > 15900000)) -> Bitmap Index Scan on ab_10([0.00][2249.35][629][0.00][2249.35][93908] width=0) (actual time=89.432..89.432 rows=10563 loops=1) Index Cond: ((a > 15900000) AND (b > 15900000)) Total runtime: 24393.555 ms (6 rows) ------------------------------------------------------------------------------------------------------------- Goodies: ----------- pg_statio_all_tables -> heap_blks_read=9931 (in case of mycorr_10) heap_blks_read=118693 (in case of mycorr_100) I have repeated the test more than 20 times up till now. I have also made the same test with different table sizes and correlation level and the same anomaly persists. Question: ------------ mycorr_100 took 11.4 s to run although it had to fetch 100000 row from the base table. mycorr_10 took 24.4 s to run although it had to fetch 10563 row from the base table. Any explanation for that? Thank you for your patience. ------------------------------------------------------------------------------------------------------------- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com