Re: [PERFORM] stats are way off on 8.4 b1
2009/4/18 Tom Lane : > =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= writes: >> That expected 1510 rows in 'assigments' seems to be pretty off, > > The planner does not trust an empty table to stay empty. Every > Postgres version in living memory has acted like that; it's not > new to 8.4. ok, thanks Quick question Tom. Can correlation be negative ? -- GJ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] stats are way off on 8.4 b1
=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= writes: > That expected 1510 rows in 'assigments' seems to be pretty off, The planner does not trust an empty table to stay empty. Every Postgres version in living memory has acted like that; it's not new to 8.4. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] stats are way off on 8.4 b1
crawler=# select * from assigments; jobid | timeout | workerid ---+-+-- (0 rows) Time: 0.705 ms crawler=# \d+ assigments Table "public.assigments" Column | Type |Modifiers | Storage | Description --+--+-+-+- jobid| bigint | not null | plain | timeout | timestamp with time zone | not null default (now() + '00:02:00'::interval) | plain | workerid | bigint | not null | plain | Indexes: "assigments_pkey" PRIMARY KEY, btree (jobid) Foreign-key constraints: "assigments_jobid_fkey" FOREIGN KEY (jobid) REFERENCES jobs(id) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE Has OIDs: no crawler=# \d+ jobs Table "public.jobs" Column | Type | Modifiers | Storage | Description +--+---+-+- id | bigint | not null default nextval('jobs_id_seq'::regclass) | plain | domainid | bigint | not null | plain | priority | smallint | not null default 1 | plain | added | timestamp with time zone | not null default now() | plain | notify_end | boolean | not null default false | plain | Indexes: "jobs_pkey" PRIMARY KEY, btree (domainid) "job_id_uidx" UNIQUE, btree (id) "foo" btree (notify_end DESC, priority DESC, added) "foo_bar" btree (notify_end, priority, added) "jobs_worker_priority_on_jobs" btree (calc_prio(notify_end, priority, added)) Foreign-key constraints: "jobs_domain_id_fkey" FOREIGN KEY (domainid) REFERENCES domains(id) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE Referenced by: "assigments_jobid_fkey" IN assigments FOREIGN KEY (jobid) REFERENCES jobs(id) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE Has OIDs: no crawler=# explain analyze select * from full_assigments_view; QUERY PLAN --- Nested Loop (cost=0.00..11040.77 rows=1510 width=31) (actual time=0.003..0.003 rows=0 loops=1) -> Nested Loop (cost=0.00..10410.97 rows=1510 width=24) (actual time=0.002..0.002 rows=0 loops=1) -> Seq Scan on assigments a (cost=0.00..25.10 rows=1510 width=16) (actual time=0.002..0.002 rows=0 loops=1) -> Index Scan using job_id_uidx on jobs j (cost=0.00..6.87 rows=1 width=16) (never executed) Index Cond: (j.id = a.jobid) -> Index Scan using domains_id_idx on domains d (cost=0.00..0.40 rows=1 width=19) (never executed) Index Cond: (d.id = j.domainid) Total runtime: 0.123 ms (8 rows) Time: 1.390 ms View "public.full_assigments_view" Column| Type | Modifiers | Storage | Description -+-+---+--+- domain_name | text| | extended | job_id | bigint | | plain| timed_out | boolean | | plain| View definition: SELECT d.name AS domain_name, j.id AS job_id, (now() - a.timeout) > '00:00:00'::interval AS timed_out FROM assigments a JOIN jobs j ON a.jobid = j.id JOIN domains d ON d.id = j.domainid; default_statistics_target=100 all the other settings are pretty much default, That expected 1510 rows in 'assigments' seems to be pretty off, especially since I just vacuumed/analyze the db. Any ideas ? -- GJ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
On Thu, 16 Apr 2009, Tom Lane wrote: Matthew, can you put together a self-contained test case with a similar slowdown? I have done a bit of investigation, and I think I might have found the smoking gun I was looking for. I just added a load of debug to the gist consistent function on the bioseg type, and did a single overlap lookup in the index. The index contains values ranging from 1 to 28,000,000 or so. The range I looked up was 23503297..23504738 (so a very small proportion). The index contains 375154 entries. The index returned 59 rows. The consistent method was called 54022 times - 5828 times for branch (internal) index entries, and 48194 times for leaf entries. Obviously this is a really bad index layout - scanning that many entries for such a small output. In fact, I saw lots of overlapping branch index entries, so the index isn't actually differentiating between the different branches of the tree very well. This indicates a failure of the picksplit or the penalty functions. I shall investigate this further next week. I shall also investigate whether this is the exact same problem that I had with the int4 gist system. Matthew -- So, given 'D' is undeclared too, with a default of zero, C++ is equal to D. mnw21, commenting on the "Surely the value of C++ is zero, but C is now 1" response to "No, C++ isn't equal to D. 'C' is undeclared [...] C++ should really be called 1" response to "C++ -- shouldn't it be called D?" -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
Tom Lane wrote: Kris Jurka writes: The hash join takes less than twenty seconds, the other two joins I killed after five minutes. I can try to collect explain analyze results later today if you'd like. Attached are the explain analyze results. The analyze part hits the hash join worst of all, so I've also included the timings without analyzing. Method Time (ms) Time w/Analyze (ms) nestloop 304853 319060 merge514517 683757 hash 18957 143731 Kris Jurka Aggregate (cost=116546928.75..116546928.76 rows=1 width=0) (actual time=143731.602..143731.604 rows=1 loops=1) -> Hash Join (cost=1470.48..108802907.84 rows=3097608361 width=0) (actual time=308.015..143724.055 rows=2437 loops=1) Hash Cond: ((l.vin)::text = (i.vin)::text) -> Append (cost=0.00..332098.75 rows=18450775 width=18) (actual time=0.069..99984.899 rows=18449996 loops=1) -> Seq Scan on liens l (cost=0.00..14.00 rows=400 width=21) (actual time=0.003..0.003 rows=0 loops=1) -> Seq Scan on liens_s1 l (cost=0.00..18633.44 rows=917444 width=18) (actual time=0.060..1828.740 rows=917444 loops=1) -> Seq Scan on liens_s2 l (cost=0.00..20.92 rows=1192 width=18) (actual time=0.010..2.274 rows=1192 loops=1) -> Seq Scan on liens_s3 l (cost=0.00..53793.79 rows=2934179 width=18) (actual time=0.054..5777.782 rows=2934179 loops=1) -> Seq Scan on liens_s4 l (cost=0.00..21069.39 rows=1214139 width=18) (actual time=0.065..2413.429 rows=1214139 loops=1) -> Seq Scan on liens_s5 l (cost=0.00..29966.37 rows=1726837 width=18) (actual time=0.046..3394.974 rows=1726837 loops=1) -> Seq Scan on liens_s6 l (cost=0.00..10587.18 rows=462918 width=18) (actual time=0.053..936.379 rows=462918 loops=1) -> Seq Scan on liens_s7 l (cost=0.00..14.00 rows=400 width=21) (actual time=0.003..0.003 rows=0 loops=1) -> Seq Scan on liens_s8 l (cost=0.00..86004.68 rows=4956168 width=18) (actual time=0.045..9729.965 rows=4956182 loops=1) -> Seq Scan on liens_s9 l (cost=0.00..320.29 rows=18429 width=18) (actual time=0.010..34.880 rows=18429 loops=1) -> Seq Scan on liens_s10 l (cost=0.00..18398.16 rows=951016 width=18) (actual time=0.055..1889.948 rows=951016 loops=1) -> Seq Scan on liens_s11 l (cost=0.00..9956.22 rows=543022 width=18) (actual time=0.055..1070.156 rows=543022 loops=1) -> Seq Scan on liens_s12 l (cost=0.00..78813.85 rows=4541785 width=18) (actual time=0.012..9431.035 rows=4541792 loops=1) -> Seq Scan on liens_s13 l (cost=0.00..4506.46 rows=182846 width=18) (actual time=0.049..374.788 rows=182846 loops=1) -> Hash (cost=1050.77..1050.77 rows=33577 width=18) (actual time=256.374..256.374 rows=33297 loops=1) -> Append (cost=0.00..1050.77 rows=33577 width=18) (actual time=0.019..188.152 rows=33297 loops=1) -> Seq Scan on impounds i (cost=0.00..11.40 rows=140 width=21) (actual time=0.002..0.002 rows=0 loops=1) -> Seq Scan on impounds_s1 i (cost=0.00..11.40 rows=140 width=21) (actual time=0.002..0.002 rows=0 loops=1) -> Seq Scan on impounds_s2 i (cost=0.00..913.87 rows=29587 width=18) (actual time=0.008..60.728 rows=29587 loops=1) -> Seq Scan on impounds_s3 i (cost=0.00..18.14 rows=414 width=18) (actual time=0.009..0.848 rows=414 loops=1) -> Seq Scan on impounds_s4 i (cost=0.00..95.96 rows=3296 width=18) (actual time=0.012..6.894 rows=3296 loops=1) Total runtime: 143731.788 ms (26 rows) QUERY PLAN --- Aggregate (cost=57241210.61..57241210.62 rows=1 width=0) (actual time=683467.350..683467.352 rows=1 loops=1) -> Merge Join (cost=2940810.41..49497189.70 rows=3097608361 width=0) (actual time=434026.342..683460.545 rows=2437 loops=1) Merge Cond: ((l.vin)::text = (i.vin)::text) -> Sort (cost=2937235.46..2983362.40 rows=18450775 width=18) (actual time=433519.957..637389.755 rows=18449961 loops=1) Sort Key: l.vin Sort Method: external merge Disk: 504728kB -> Append (cost=0.00..332098.75 rows=18450775 width=18) (actual time=14.764..102905.170 rows=18449996 loops=1) -> Seq Scan on liens l (cost=0.00..14.00 rows=400 width=21) (actual time=0.003..0.003 rows=0 loops=1) -> Seq Scan on liens_s1 l (cost=0.00..18633.44 rows=917444 width=18) (actual time=14.755..2167.668 rows=917444 loops=1) -> Seq Scan on liens_s2 l (cost=0.00..20.92 rows=1192 width=18) (actual time=0.012..2.
Re: [PERFORM] No hash join across partitioned tables?
Tom Lane wrote: Is there another issue here besides that one? I think you were hoping that the hash join would be faster than the alternatives, but the cost estimate says it's a lot slower. Is that actually the case? The hash join takes less than twenty seconds, the other two joins I killed after five minutes. I can try to collect explain analyze results later today if you'd like. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
Kris Jurka writes: > The hash join takes less than twenty seconds, the other two joins I > killed after five minutes. I can try to collect explain analyze results > later today if you'd like. Please, unless the test case you already posted has similar behavior. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
Kris Jurka writes: > So the default disable_cost isn't enough to push it to use the hash join > plan and goes back to nestloop. Since disable_cost hasn't been touched > since January 2000, perhaps it's time to bump that up to match today's > hardware and problem sizes? I think disable_cost was originally set at a time when costs were integers :-(. Yeah, there's probably no reason not to throw another zero or two on it. Is there another issue here besides that one? I think you were hoping that the hash join would be faster than the alternatives, but the cost estimate says it's a lot slower. Is that actually the case? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance