Re: [PERFORM] Constraint propagating for equal fields

2010-02-01 Thread Віталій Тимчишин
30 січня 2010 р. 04:30 Greg Stark gsst...@mit.edu написав:

 2010/1/28 Віталій Тимчишин tiv...@gmail.com
 
  I've always thought that PostgreSQL would propagate constraint from
 field1 to field2 if condition says field1=field2, but this does not seem the
 case:

 version?


PostgreSQL 8.3.7 on amd64-portbld-freebsd7.2, compiled by GCC cc (GCC) 4.2.1
20070719  [FreeBSD]


Re: [PERFORM] Slow query: table iteration (8.3)

2010-02-01 Thread Yeb Havinga

Glenn Maynard wrote:

Hitting a performance issues that I'm not sure how to diagnose.

SELECT highscores_for_steps_and_card(s.id, 591, 1) FROM stomp_steps s;
Seq Scan on stomp_steps s  (cost=0.00..793.52 rows=2902 width=4)
(actual time=26509.919..26509.919 rows=0 loops=1)
Total runtime: 26509.972 ms
  

Hello Glenn,

Stomp_steps is analyzed to 2902 rows but when you run the query the
actual rows are 0. This means that the highscore function is not called
or the number 0 is incorrect.
Suppose that the number of rows is 2900, then 26 seconds means 100ms per
function call. This still is a lot, compared to the 0.054 ms analyze
result below. The truth might be that you probably got that result by
explaining the query in the function with actual parameter values. This
plan differs from the one that is made when the function is called from
sql and is planned (once) without parameters, and in that case the plan
is probably different. A way to check the plan of that query is to turn
on debug_print_plan and watch the server log. It takes a bit getting
used. The plan starts with CONTEXT:  SQL function functionname during
startup and is also recognized because in the opexpr (operator
expression) one of the operands is a parameter. Important is the total
cost of the top plan node (the limit).

I know 8.3 is mentioned in the subject, but I think that a WITH query
(http://www.postgresql.org/docs/8.4/interactive/queries-with.html) could
be a good solution to your problem and may be worth trying out, if you
have the possibility to try out 8.4.

Regards,
Yeb Havinga




The inner function looks like this:

CREATE FUNCTION highscores_for_steps_and_card(steps_id int, card_id
int, count int) RETURNS SETOF INTEGER LANGUAGE SQL AS $$
SELECT r.id FROM stomp_round r
WHERE ($1 IS NULL OR r.steps_id = $1) AND ($2 IS NULL OR
r.user_card_id = $2)
ORDER BY r.score DESC LIMIT $3
$$

 Limit  (cost=13.12..13.12 rows=1 width=8) (actual time=0.054..0.054
rows=0 loops=1)
   -  Sort  (cost=13.12..13.12 rows=1 width=8) (actual
time=0.051..0.051 rows=0 loops=1)
 Sort Key: score
 Sort Method:  quicksort  Memory: 17kB
 -  Bitmap Heap Scan on stomp_round r  (cost=9.09..13.11
rows=1 width=8) (actual time=0.036..0.036 rows=0 loops=1)
   Recheck Cond: ((280 = steps_id) AND (user_card_id = 591))
   -  BitmapAnd  (cost=9.09..9.09 rows=1 width=0) (actual
time=0.032..0.032 rows=0 loops=1)
 -  Bitmap Index Scan on stomp_round_steps_id
(cost=0.00..4.40 rows=20 width=0) (actual time=0.030..0.030 rows=0
loops=1)
   Index Cond: (280 = steps_id)
 -  Bitmap Index Scan on stomp_round_user_card_id
 (cost=0.00..4.44 rows=25 width=0) (never executed)
   Index Cond: (user_card_id = 591)
 Total runtime: 0.153 ms
(12 rows)

stomp_steps has about 1500 rows, so it finds 1500 high scores, one for
each stage.

I expected scalability issues from this on a regular drive, since
it'll be doing a ton of index seeking when not working out of cache,
so I expected to need to change to an SSD at some point (when it no
longer easily fits in cache).  However, I/O doesn't seem to be the
bottleneck yet.  If I run it several times, it consistently takes 26
seconds.  The entire database is in OS cache (find | xargs cat:
250ms).

I'm not sure why the full query (26s) is orders of magnitude slower
than 1500*0.150ms (225ms).  It's not a very complex query, and I'd
hope it's not being re-planned every iteration through the loop.  Any
thoughts?  Using SELECT to iterate over a table like this is very
useful (and I don't know any practical alternative), but it's
difficult to profile since it doesn't play nice with EXPLAIN ANALYZE.

  




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: table iteration (8.3)

2010-02-01 Thread Glenn Maynard
On Mon, Feb 1, 2010 at 6:15 AM, Yeb Havinga yhavi...@gmail.com wrote:
 Glenn Maynard wrote:
 SELECT highscores_for_steps_and_card(s.id, 591, 1) FROM stomp_steps s;
 Seq Scan on stomp_steps s  (cost=0.00..793.52 rows=2902 width=4)
 (actual time=26509.919..26509.919 rows=0 loops=1)
 Total runtime: 26509.972 ms

 Stomp_steps is analyzed to 2902 rows but when you run the query the actual
 rows are 0. This means that the highscore function is not called or the
 number 0 is incorrect.

This SELECT returns 0 rows: it calls the function 1500 times, and each
time it returns no data, because there simply aren't any results for
these parameters.

 below. The truth might be that you probably got that result by explaining
 the query in the function with actual parameter values. This plan differs
 from the one that is made when the function is called from sql and is
 planned (once) without parameters, and in that case the plan is probably
 different.

Yeah.  It would help a lot if EXPLAIN could show query plans of
functions used by the statement and not just the top-level query.

 A way to check the plan of that query is to turn on
 debug_print_plan and watch the server log. It takes a bit getting used. The
 plan starts with CONTEXT:  SQL function functionname during startup and is
 also recognized because in the opexpr (operator expression) one of the
 operands is a parameter. Important is the total cost of the top plan node
 (the limit).

Thanks.

SELECT highscores_for_steps_and_card(s.id, 591, 1) FROM stomp_steps s:

Squinting at the output, it definitely looks like a less optimized
plan; it's using a SEQSCAN instead of BITMAPHEAPSCAN.  (I've attached
the output.)

Does the planner not optimize functions based on context?  That seems
like a huge class of optimizations.  The first NULLTEST can be
optimized away, since that parameter comes from a NOT NULL source (a
PK).  The second NULLTEST can also be optimized away, since it's a
constant value (591).  The search could be a BITMAPHEAPSCAN,
substituting the s.id value for each call, instead of a SEQSCAN.  (Not
that I'm concerned about a few cheap NULLTESTs, I'm just surprised at
it using such a generic plan.)

If I create a new function with the constant parameters hard-coded,
it's back to BITMAPHEAPSCAN: 175ms.  This suggests a horrible
workaround: creating temporary functions every time I make this type
of query, with the fixed values substituted textually.  I'd really
love to know a less awful fix.

 I know 8.3 is mentioned in the subject, but I think that a WITH query
 (http://www.postgresql.org/docs/8.4/interactive/queries-with.html) could be
 a good solution to your problem and may be worth trying out, if you have the
 possibility to try out 8.4.

I can't see how to apply WITH to this.  Non-recursive WITH seems like
syntax sugar that doesn't do anything a plain SELECT can't do, and I
don't think what I'm doing here can be done with a regular SELECT.

-- 
Glenn Maynard
SELECT highscores_for_steps_and_card(s.id, 591, 1) FROM stomp_steps s;
 QUERY PLAN

 Seq Scan on stomp_steps s  (cost=0.00..793.52 rows=2902 width=4) (actual 
time=1726.998..26729.717 rows=17 loops=1)
 Total runtime: 26729.822 ms

DEBUG:  plan:
DETAIL: {PLANNEDSTMT
   :commandType 1
   :canSetTag true
   :planTree
  {SEQSCAN
  :startup_cost 0.00
  :total_cost 793.52
  :plan_rows 2902
  :plan_width 4
  :targetlist (
 {TARGETENTRY
 :expr
{FUNCEXPR
:funcid 240532
:funcresulttype 23
:funcretset true
:funcformat 0
:args (
   {VAR
   :varno 1
   :varattno 1
   :vartype 23
   :vartypmod -1
   :varlevelsup 0
   :varnoold 1
   :varoattno 1
   }
   {CONST
   :consttype 23
   :consttypmod -1
   :constlen 4
   :constbyval true
   :constisnull false
   :constvalue 4 [ 79 2 0 0 ]
   }
   {CONST
   :consttype 23
   :consttypmod -1
   :constlen 4
   :constbyval true
   :constisnull false
   :constvalue 4 [ 1 0 0 0 ]
   }
)
}
 :resno 1
 :resname highscores_for_steps_and_card
 :ressortgroupref 0
 :resorigtbl 0
 :resorigcol 0
 :resjunk false
 }
  )
  :qual 
  :lefttree 
  :righttree 
  :initPlan 
  :extParam (b)
  :allParam (b)
  :scanrelid 1
  }
   :rtable (
  {RTE
  :alias
 {ALIAS
 :aliasname s
 :colnames 
 }
  :eref
 {ALIAS
 :aliasname s
 :colnames (id ...)

[PERFORM] the jokes for pg concurrency write performance

2010-02-01 Thread wyx6fox
hi, first, thanks u for make so good opensource db . 

recently maybe half an years ago ,i begin to use pg in a big project for 
insurance project, belong as the project go on ,and 
i found some performance problem on concurrency write situation , then i do a 
research on concurrency write strategy on postgresql ,

i found a joke ,maybe this joke concurrency strategy is the designer's pround 
idea, but i think it is a joke , next let me describe the problems:

* joke 1: insert operation would use a excluse lock on reference row by the 
foreign key . a big big big performance killer , i think this is a stupid 
design . 

* joke 2: concurrency update on same row would lead to that other transaction 
must wait the earlier transaction complete , this would kill the concurrency 
performance in some long time transaction situation . a stupid design to , 

this joke design's reason is avoid confliction on read committed isolation , 
such as this situation:
UPDATE webpages SET hits = hits + 1 WHERE url ='some url '; 
when concurrency write transaction on read committed isolation , the hits may 
result wrong . 

this joke design would do seriable write , but i think any stupid developer 
would not write this code like this stupid sample code , a good code is 
use a exclusive lock to do a seriable write on this same row , but the joker 
think he should help us to do this , i say ,u should no kill concurrency 
performance and help i do this fucking stupid sample code , i would use a 
select .. for update to do this :

select 1 from lock_table where lockId='lock1' for update ;
UPDATE webpages SET hits = hits + 1 WHERE url ='some url '; 





* joke 3: update 10 rows on a table no any index , it cost 5-8 seconds , 
this is not acceptable in some bulk update situation . 



Re: [PERFORM] use pgsql in a big project, but i found pg has some big problem on concurrency write operation, maybe a joke for myself !

2010-02-01 Thread Kevin Grittner
wrote:
 hi, first, thanks u for make so good opensource db .
 
Not a bad start.
 
 [insults and hand-waving]
 
Not a good way to continue.
 
If there's some particular performance problem you would like to try
to solve, please read these pages, and try again:
 
http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
 
It would also help if you posted something where the plain-text
format was a bit more readable.
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] the jokes for pg concurrency write performance

2010-02-01 Thread david

On Tue, 2 Feb 2010, wyx6...@sina.com wrote:


hi, first, thanks u for make so good opensource db .


first you thank the developers, then you insult them. are you asking for 
help or just trying to cause problems.



recently maybe half an years ago ,i begin to use pg in a big project for 
insurance project, belong as the project go on ,and
i found some performance problem on concurrency write situation , then i do a 
research on concurrency write strategy on postgresql ,

i found a joke ,maybe this joke concurrency strategy is the designer's pround 
idea, but i think it is a joke , next let me describe the problems:

* joke 1: insert operation would use a excluse lock on reference row by the 
foreign key . a big big big performance killer , i think this is a stupid 
design .


this I don't know enough to answer


* joke 2: concurrency update on same row would lead to that other transaction 
must wait the earlier transaction complete , this would kill the concurrency 
performance in some long time transaction situation . a stupid design to ,

this joke design's reason is avoid confliction on read committed isolation , 
such as this situation:
UPDATE webpages SET hits = hits + 1 WHERE url ='some url ';
when concurrency write transaction on read committed isolation , the hits may 
result wrong .

this joke design would do seriable write , but i think any stupid developer 
would not write this code like this stupid sample code , a good code is
use a exclusive lock to do a seriable write on this same row , but the joker 
think he should help us to do this , i say ,u should no kill concurrency 
performance and help i do this fucking stupid sample code , i would use a 
select .. for update to do this :

select 1 from lock_table where lockId='lock1' for update ;
UPDATE webpages SET hits = hits + 1 WHERE url ='some url ';



If you have one transaction start modifying a row, and then have another 
one start, how do you not have one wait for the other? Remember that any 
transaction can end up running for a long time and may revert at any time.


Why would you want to lock the entire table for an update as simple as you 
describe?



* joke 3: update 10 rows on a table no any index , it cost 5-8 seconds , 
this is not acceptable in some bulk update situation .


This one is easy, you did 10 inserts as seperate transactions, if you 
do them all as one transaction (or better still as a copy) they would 
complete much faster.


You seem to be assuming incopatence on the part of the developers whenever 
you run into a problem. If you want them to help you, I would suggest that 
you assume that they know what they are doing (after all, if they didn't 
you wouldn't want to use their code for anything important anyway), and 
instead ask what the right way is to do what you are trying to do.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] the jokes for pg concurrency write performance

2010-02-01 Thread Richard Broersma

On Feb 1, 2010, at 8:57 PM, wyx6...@sina.com wrote:

i found a joke ,maybe this joke concurrency strategy is the  
designer's pround idea, but i think it is a joke , next let me  
describe the problems:


 I would suggest that the behavior that you dislike so much is really  
not idea of the postgresql developers as much as it is of Prof. Codd  
and the ANSI-SQL committee.  I wonder if a creditable relational DBMS  
exists that doesn't behave in exactly the way you've described?




UPDATE webpages SET hits = hits + 1 WHERE url ='some url ';

 i say ,u should no kill concurrency performance



One alternative design would be to log the timestamp of your web page  
hits rather than update a hits count field.


Once you have this design, if the table becomes volumous with  
historical logs you have the choice the use horizontal table  
partitioning or you can roll up all of the historical logs into an  
aggregating materialized view(table).


Regarding all of the jokes you mentioned, I found them all to be very  
funny indeed.  :)


Regards,
Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Sincere apology for my insults and hand-waving in these mails:the jokes for pg concurrency write performance

2010-02-01 Thread wyx6fox
nbsp;I want say to the developer of pg, Thanks very much , you make so great 
project。 
nbsp;
I think i am not a rough guy , Please forgive me.