Re: [PERFORM] stats are way off on 8.4 b1

2009-04-17 Thread Grzegorz Jaśkiewicz
2009/4/18 Tom Lane :
> =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?=  writes:
>> That expected 1510 rows in 'assigments' seems to be pretty off,
>
> The planner does not trust an empty table to stay empty.  Every
> Postgres version in living memory has acted like that; it's not
> new to 8.4.

ok, thanks
Quick question Tom. Can correlation be negative ?


-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats are way off on 8.4 b1

2009-04-17 Thread Tom Lane
=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?=  writes:
> That expected 1510 rows in 'assigments' seems to be pretty off,

The planner does not trust an empty table to stay empty.  Every
Postgres version in living memory has acted like that; it's not
new to 8.4.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] stats are way off on 8.4 b1

2009-04-17 Thread Grzegorz Jaśkiewicz
crawler=# select *  from assigments;
 jobid | timeout | workerid
---+-+--
(0 rows)

Time: 0.705 ms
crawler=# \d+ assigments
   Table "public.assigments"
  Column  |   Type   |Modifiers
| Storage | Description
--+--+-+-+-
 jobid| bigint   | not null
| plain   |
 timeout  | timestamp with time zone | not null default (now() +
'00:02:00'::interval) | plain   |
 workerid | bigint   | not null
| plain   |
Indexes:
"assigments_pkey" PRIMARY KEY, btree (jobid)
Foreign-key constraints:
"assigments_jobid_fkey" FOREIGN KEY (jobid) REFERENCES jobs(id)
MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE
Has OIDs: no

crawler=# \d+ jobs
Table "public.jobs"
   Column   |   Type   | Modifiers
| Storage | Description
+--+---+-+-
 id | bigint   | not null default
nextval('jobs_id_seq'::regclass) | plain   |
 domainid   | bigint   | not null
| plain   |
 priority   | smallint | not null default 1
| plain   |
 added  | timestamp with time zone | not null default now()
| plain   |
 notify_end | boolean  | not null default false
| plain   |
Indexes:
"jobs_pkey" PRIMARY KEY, btree (domainid)
"job_id_uidx" UNIQUE, btree (id)
"foo" btree (notify_end DESC, priority DESC, added)
"foo_bar" btree (notify_end, priority, added)
"jobs_worker_priority_on_jobs" btree (calc_prio(notify_end,
priority, added))
Foreign-key constraints:
"jobs_domain_id_fkey" FOREIGN KEY (domainid) REFERENCES
domains(id) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE
Referenced by:
  "assigments_jobid_fkey" IN assigments FOREIGN KEY (jobid) REFERENCES
jobs(id) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE
Has OIDs: no

crawler=# explain analyze select * from full_assigments_view;
  QUERY PLAN
---
 Nested Loop  (cost=0.00..11040.77 rows=1510 width=31) (actual
time=0.003..0.003 rows=0 loops=1)
   ->  Nested Loop  (cost=0.00..10410.97 rows=1510 width=24) (actual
time=0.002..0.002 rows=0 loops=1)
 ->  Seq Scan on assigments a  (cost=0.00..25.10 rows=1510
width=16) (actual time=0.002..0.002 rows=0 loops=1)
 ->  Index Scan using job_id_uidx on jobs j  (cost=0.00..6.87
rows=1 width=16) (never executed)
   Index Cond: (j.id = a.jobid)
   ->  Index Scan using domains_id_idx on domains d  (cost=0.00..0.40
rows=1 width=19) (never executed)
 Index Cond: (d.id = j.domainid)
 Total runtime: 0.123 ms
(8 rows)

Time: 1.390 ms

 View "public.full_assigments_view"
   Column|  Type   | Modifiers | Storage  | Description
-+-+---+--+-
 domain_name | text|   | extended |
 job_id  | bigint  |   | plain|
 timed_out   | boolean |   | plain|
View definition:
 SELECT d.name AS domain_name, j.id AS job_id, (now() - a.timeout) >
'00:00:00'::interval AS timed_out
   FROM assigments a
   JOIN jobs j ON a.jobid = j.id
   JOIN domains d ON d.id = j.domainid;


default_statistics_target=100
all the other settings are pretty much default,

That expected 1510 rows in 'assigments' seems to be pretty off,
especially since I just vacuumed/analyze the db.
Any ideas ?


-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2009-04-17 Thread Matthew Wakeling

On Thu, 16 Apr 2009, Tom Lane wrote:

Matthew, can you put together a self-contained test case with a similar
slowdown?


I have done a bit of investigation, and I think I might have found the 
smoking gun I was looking for. I just added a load of debug to the gist 
consistent function on the bioseg type, and did a single overlap lookup in 
the index.


The index contains values ranging from 1 to 28,000,000 or so.
The range I looked up was 23503297..23504738 (so a very small proportion).
The index contains 375154 entries.
The index returned 59 rows.
The consistent method was called 54022 times - 5828 times for branch
(internal) index entries, and 48194 times for leaf entries.

Obviously this is a really bad index layout - scanning that many entries 
for such a small output. In fact, I saw lots of overlapping branch index 
entries, so the index isn't actually differentiating between the different 
branches of the tree very well. This indicates a failure of the picksplit 
or the penalty functions. I shall investigate this further next week.


I shall also investigate whether this is the exact same problem that I had 
with the int4 gist system.


Matthew

--
So, given 'D' is undeclared too, with a default of zero, C++ is equal to D.
 mnw21, commenting on the "Surely the value of C++ is zero, but C is now 1"
 response to "No, C++ isn't equal to D. 'C' is undeclared [...] C++ should
 really be called 1" response to "C++ -- shouldn't it be called D?"

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2009-04-17 Thread Kris Jurka

Tom Lane wrote:

Kris Jurka  writes:
The hash join takes less than twenty seconds, the other two joins I 
killed after five minutes.  I can try to collect explain analyze results 
later today if you'd like.




Attached are the explain analyze results.  The analyze part hits the 
hash join worst of all, so I've also included the timings without analyzing.


Method   Time (ms)  Time w/Analyze (ms)
nestloop 304853 319060
merge514517 683757
hash  18957 143731

Kris Jurka
 Aggregate  (cost=116546928.75..116546928.76 rows=1 width=0) (actual 
time=143731.602..143731.604 rows=1 loops=1)
   ->  Hash Join  (cost=1470.48..108802907.84 rows=3097608361 width=0) (actual 
time=308.015..143724.055 rows=2437 loops=1)
 Hash Cond: ((l.vin)::text = (i.vin)::text)
 ->  Append  (cost=0.00..332098.75 rows=18450775 width=18) (actual 
time=0.069..99984.899 rows=18449996 loops=1)
   ->  Seq Scan on liens l  (cost=0.00..14.00 rows=400 width=21) 
(actual time=0.003..0.003 rows=0 loops=1)
   ->  Seq Scan on liens_s1 l  (cost=0.00..18633.44 rows=917444 
width=18) (actual time=0.060..1828.740 rows=917444 loops=1)
   ->  Seq Scan on liens_s2 l  (cost=0.00..20.92 rows=1192 
width=18) (actual time=0.010..2.274 rows=1192 loops=1)
   ->  Seq Scan on liens_s3 l  (cost=0.00..53793.79 rows=2934179 
width=18) (actual time=0.054..5777.782 rows=2934179 loops=1)
   ->  Seq Scan on liens_s4 l  (cost=0.00..21069.39 rows=1214139 
width=18) (actual time=0.065..2413.429 rows=1214139 loops=1)
   ->  Seq Scan on liens_s5 l  (cost=0.00..29966.37 rows=1726837 
width=18) (actual time=0.046..3394.974 rows=1726837 loops=1)
   ->  Seq Scan on liens_s6 l  (cost=0.00..10587.18 rows=462918 
width=18) (actual time=0.053..936.379 rows=462918 loops=1)
   ->  Seq Scan on liens_s7 l  (cost=0.00..14.00 rows=400 width=21) 
(actual time=0.003..0.003 rows=0 loops=1)
   ->  Seq Scan on liens_s8 l  (cost=0.00..86004.68 rows=4956168 
width=18) (actual time=0.045..9729.965 rows=4956182 loops=1)
   ->  Seq Scan on liens_s9 l  (cost=0.00..320.29 rows=18429 
width=18) (actual time=0.010..34.880 rows=18429 loops=1)
   ->  Seq Scan on liens_s10 l  (cost=0.00..18398.16 rows=951016 
width=18) (actual time=0.055..1889.948 rows=951016 loops=1)
   ->  Seq Scan on liens_s11 l  (cost=0.00..9956.22 rows=543022 
width=18) (actual time=0.055..1070.156 rows=543022 loops=1)
   ->  Seq Scan on liens_s12 l  (cost=0.00..78813.85 rows=4541785 
width=18) (actual time=0.012..9431.035 rows=4541792 loops=1)
   ->  Seq Scan on liens_s13 l  (cost=0.00..4506.46 rows=182846 
width=18) (actual time=0.049..374.788 rows=182846 loops=1)
 ->  Hash  (cost=1050.77..1050.77 rows=33577 width=18) (actual 
time=256.374..256.374 rows=33297 loops=1)
   ->  Append  (cost=0.00..1050.77 rows=33577 width=18) (actual 
time=0.019..188.152 rows=33297 loops=1)
 ->  Seq Scan on impounds i  (cost=0.00..11.40 rows=140 
width=21) (actual time=0.002..0.002 rows=0 loops=1)
 ->  Seq Scan on impounds_s1 i  (cost=0.00..11.40 rows=140 
width=21) (actual time=0.002..0.002 rows=0 loops=1)
 ->  Seq Scan on impounds_s2 i  (cost=0.00..913.87 
rows=29587 width=18) (actual time=0.008..60.728 rows=29587 loops=1)
 ->  Seq Scan on impounds_s3 i  (cost=0.00..18.14 rows=414 
width=18) (actual time=0.009..0.848 rows=414 loops=1)
 ->  Seq Scan on impounds_s4 i  (cost=0.00..95.96 rows=3296 
width=18) (actual time=0.012..6.894 rows=3296 loops=1)
 Total runtime: 143731.788 ms
(26 rows)

QUERY PLAN
---
 Aggregate  (cost=57241210.61..57241210.62 rows=1 width=0) (actual 
time=683467.350..683467.352 rows=1 loops=1)
   ->  Merge Join  (cost=2940810.41..49497189.70 rows=3097608361 width=0) 
(actual time=434026.342..683460.545 rows=2437 loops=1)
 Merge Cond: ((l.vin)::text = (i.vin)::text)
 ->  Sort  (cost=2937235.46..2983362.40 rows=18450775 width=18) (actual 
time=433519.957..637389.755 rows=18449961 loops=1)
   Sort Key: l.vin
   Sort Method:  external merge  Disk: 504728kB
   ->  Append  (cost=0.00..332098.75 rows=18450775 width=18) 
(actual time=14.764..102905.170 rows=18449996 loops=1)
 ->  Seq Scan on liens l  (cost=0.00..14.00 rows=400 
width=21) (actual time=0.003..0.003 rows=0 loops=1)
 ->  Seq Scan on liens_s1 l  (cost=0.00..18633.44 
rows=917444 width=18) (actual time=14.755..2167.668 rows=917444 loops=1)
 ->  Seq Scan on liens_s2 l  (cost=0.00..20.92 rows=1192 
width=18) (actual time=0.012..2.

Re: [PERFORM] No hash join across partitioned tables?

2009-04-17 Thread Kris Jurka

Tom Lane wrote:


Is there another issue here besides that one?  I think you were hoping
that the hash join would be faster than the alternatives, but the cost
estimate says it's a lot slower.  Is that actually the case?



The hash join takes less than twenty seconds, the other two joins I 
killed after five minutes.  I can try to collect explain analyze results 
later today if you'd like.


Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2009-04-17 Thread Tom Lane
Kris Jurka  writes:
> The hash join takes less than twenty seconds, the other two joins I 
> killed after five minutes.  I can try to collect explain analyze results 
> later today if you'd like.

Please, unless the test case you already posted has similar behavior.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2009-04-17 Thread Tom Lane
Kris Jurka  writes:
> So the default disable_cost isn't enough to push it to use the hash join 
> plan and goes back to nestloop.  Since disable_cost hasn't been touched 
> since January 2000, perhaps it's time to bump that up to match today's 
> hardware and problem sizes?

I think disable_cost was originally set at a time when costs were
integers :-(.  Yeah, there's probably no reason not to throw another
zero or two on it.

Is there another issue here besides that one?  I think you were hoping
that the hash join would be faster than the alternatives, but the cost
estimate says it's a lot slower.  Is that actually the case?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance