Re: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

2014-11-03 Thread Igor Neyman


-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of p...@cmicdo.com
Sent: Monday, November 03, 2014 11:34 AM
To: pgsql-general@postgresql.org
Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = 
(SELECT ...)

Why does the UPDATE SET = FROM choose a more poorly performing plan than the 
UPDATE SET = (SELECT ...)?  It seems to me that it is the same join.
 
I'm using 9.3.5.  

CREATE TABLE orig
(
key1VARCHAR(11) PRIMARY KEY,
time1   TIME
);

INSERT INTO orig (key1, time1)
SELECT 
a::TEXT,
(((random()*100)::INT % 24)::TEXT || ':' ||
((random()*100)::INT % 60)::TEXT)::TIME FROM 
generate_series(800, 8000200) a;

CREATE INDEX odx ON orig(key1);

CREATE TABLE second (LIKE orig);
 
INSERT INTO second (key1) 
SELECT (800+(((random()*100)::INT) % 100))::TEXT
FROM generate_series(1,40);

EXPLAIN ANALYZE
UPDATE second SET time1 = orig.time1
FROM orig
WHERE second.key1 = orig.key1;

 QUERY PLAN

 Update on second  (cost=69461.02..106082.02 rows=40 width=32) (actual 
time=16033.023..16033.023 rows=0 loops=1)
   -  Hash Join  (cost=69461.02..106082.02 rows=40 width=32) (actual 
time=7698.445..12992.039 rows=40 loops=1)
 Hash Cond: ((second.key1)::text = (orig.key1)::text)
 -  Seq Scan on second  (cost=0.00..12627.00 rows=40 width=18) 
(actual time=49.820..791.397 rows=40 loops=1)
 -  Hash  (cost=31765.01..31765.01 rows=201 width=26) (actual 
time=7648.540..7648.540 rows=201 loops=1)
   Buckets: 4096  Batches: 128  Memory Usage: 717kB
   -  Seq Scan on orig  (cost=0.00..31765.01 rows=201 
width=26) (actual time=0.014..3655.844 rows=201 loops=1)  Total runtime: 
16033.193 ms
(8 rows)

UPDATE second SET time1 = NULL;

EXPLAIN ANALYZE
UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second
WHERE orig.key1 = second.key1 LIMIT 1);


   QUERY PLAN

 Update on second  (cost=3.60..19078.19 rows=1279959 width=18) (actual 
time=4642.453..4642.453 rows=0 loops=1)
   InitPlan 1 (returns $1)
 -  Limit  (cost=0.43..3.60 rows=1 width=8) (actual time=2.611..2.613 
rows=1 loops=1)
   -  Nested Loop  (cost=0.43..4056331.83 rows=1279959 width=8) 
(actual time=2.606..2.606 rows=1 loops=1)
 -  Seq Scan on second second_1  (cost=0.00..19074.59 
rows=1279959 width=12) (actual time=2.487..2.487 rows=1 loops=1)
 -  Index Scan using odx on orig  (cost=0.43..3.14 rows=1 
width=20) (actual time=0.098..0.098 rows=1 loops=1)
   Index Cond: ((key1)::text = (second_1.key1)::text)
   -  Seq Scan on second  (cost=0.00..19074.59 rows=1279959 width=18) (actual 
time=6.420..817.739 rows=40 loops=1)  Total runtime: 4642.561 ms
(9 rows)


These 2 queries are not the same.

The first query updates rows in the second table with the orig.time1 values 
based on key1 column match.
The second query finds first possible match (based on key1 column) and assigns 
orig.time1 value from the matched row to every record in second table.

Regards,
Igor Neyman



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

2014-11-03 Thread pbj

  
  On Mon, 11/3/14, Igor Neyman iney...@perceptron.com wrote:
 
   -Original Message-
   From: pgsql-general-ow...@postgresql.org
   [mailto:pgsql-general-ow...@postgresql.org]  
   On Behalf Of p...@cmicdo.com
   Sent: Monday, November 03, 2014 11:34 AM
   To: pgsql-general@postgresql.org
   Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM
   vs UPDATE SET = (SELECT ...)
   
   Why does the UPDATE SET = FROM choose a more poorly
   performing plan than the UPDATE SET = (SELECT ...)?  It
   seems to me that it is the same join. 
 
   I'm using 9.3.5.
 
   CREATE TABLE orig
   (   
   key1VARCHAR(11)
   PRIMARY KEY,   
   time1   TIME
   );
 
   INSERT INTO orig (key1, time1)
   SELECT
   a::TEXT,
   (((random()*100)::INT %
   24)::TEXT || ':' ||
   ((random()*100)::INT %
   60)::TEXT)::TIME FROM generate_series(800,
   8000200) a;
 
   CREATE INDEX odx ON orig(key1);
 
   CREATE TABLE second (LIKE orig); 
 
   INSERT INTO second (key1)  
   SELECT
   (800+(((random()*100)::INT) % 100))::TEXT
   FROM generate_series(1,40);
   
   EXPLAIN ANALYZE
   UPDATE second SET time1 = orig.time1
   FROM orig
   WHERE second.key1 = orig.key1;  
   
 [.]
   
   UPDATE second SET time1 = NULL;
   
   EXPLAIN ANALYZE
   UPDATE second SET time1 = (SELECT orig.time1 FROM
   orig,second
 
   WHERE orig.key1 = second.key1
   LIMIT 1);   
 
 [.]
   
   These 2 queries are not the same.
  
   
   The first query updates rows in the second table with the
   orig.time1 values based on key1 column match.
   The second query finds first possible match (based on key1
   column) and assigns orig.time1 value from the matched row to
   every record in second table.
   
   Regards,
   Igor Neyman
 
I see that now.  I was trying to reproduce something from work from
memory and got tripped up on a sublety of UPDATE ... SELECT.  The query
I ran at work was like this:
 
EXPLAIN ANALYZE
UPDATE second se SET time1 = (SELECT time1 FROM orig
WHERE orig.key1 = se.key1);

  QUERY PLAN
--
 Update on second se  (cost=0.00..3390627.00 rows=40 width=18) (actual 
time=18698.795..18698.795 rows=0 loops=1)
   -  Seq Scan on second se  (cost=0.00..3390627.00 rows=40 width=18) 
(actual time=7.558..16694.600 rows=40 loops=1)
 SubPlan 1
   -  Index Scan using odx on orig  (cost=0.43..8.45 rows=1 width=8) 
(actual time=0.033..0.035 rows=1 loops=40)
 Index Cond: ((key1)::text = (se.key1)::text)
 Total runtime: 18698.865 ms
(6 rows)

This does correctly match and update all of the second table entries.
The plan actually runs longer than the UPDATE ... FROM, which squares
with a comment the fine manual.

Thanks!
PJ




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general