Re: [GENERAL] Can you spot the difference?

2013-04-17 Thread Moshe Jacobson
On Tue, Apr 16, 2013 at 7:29 PM, Adrian Klaver adrian.kla...@gmail.comwrote:

 
 The autovacuum daemon, if enabled, will automatically issue ANALYZE
 commands whenever the content of a table has changed sufficiently. However,
 administrators might prefer to rely on manually-scheduled ANALYZE
 operations, particularly if it is known that update activity on a table
 will not affect the statistics of interesting columns. The daemon
 schedules ANALYZE strictly as a function of the number of rows inserted or
 updated; it has no knowledge of whether that will lead to meaningful
 statistical changes.
 

 So at a guess there has not been enough churn on the table.


So pg_restore's COPY would not trigger the ANALYZE? That seems wrong.


-- 
Moshe Jacobson
Nead Werx, Inc. | Manager of Systems Engineering
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
mo...@neadwerx.com | www.neadwerx.com

Quality is not an act, it is a habit. -- Aristotle


Re: [GENERAL] Can you spot the difference?

2013-04-17 Thread Adrian Klaver

On 04/17/2013 07:49 AM, Moshe Jacobson wrote:


On Tue, Apr 16, 2013 at 7:29 PM, Adrian Klaver adrian.kla...@gmail.com
mailto:adrian.kla...@gmail.com wrote:


The autovacuum daemon, if enabled, will automatically issue ANALYZE
commands whenever the content of a table has changed sufficiently.
However, administrators might prefer to rely on manually-scheduled
ANALYZE operations, particularly if it is known that update activity
on a table will not affect the statistics of interesting columns.
The daemon schedules ANALYZE strictly as a function of the number of
rows inserted or updated; it has no knowledge of whether that will
lead to meaningful statistical changes.


So at a guess there has not been enough churn on the table.


So pg_restore's COPY would not trigger the ANALYZE? That seems wrong.


Well the argument most often heard is that the command has a cost and it 
left to the discretion of the user as to when to incur that cost. For 
instance I, and others, often use COPY to transfer data from some 
external data source to a holding table, from which the data is then 
manipulated/transferred to one or more tables. I have not interest in 
having the holding table ANALYZEd as I am going to hit all the rows 
anyway. Generally what people do in your situation is include a manual 
ANALYZE in a script that is part of or follows the COPY.





--
Moshe Jacobson
Nead Werx, Inc. | Manager of Systems Engineering
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
mo...@neadwerx.com mailto:mo...@neadwerx.com | www.neadwerx.com
http://www.neadwerx.com/

Quality is not an act, it is a habit. -- Aristotle



--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Can you spot the difference?

2013-04-16 Thread Moshe Jacobson
Hi PostgreSQL friends,

I have two databases in the same cluster that are almost identical. One is
a copy of the other as we are developing some new features in the copy.

My problem is that the exact same simple query performs great in the
original database (ises) and dismally in the copy database
(ises_coelacanth). The problem is that in ises, it uses an index scan,
but in ises_coelacanth it uses a sequential scan:

postgres@moshe=devmain:ises=# explain analyze SELECT count(*)  FROM
 tb_order_location ol   JOIN tb_line_item li on li.order_location =
 ol.order_location   WHERE  li.tracking_number = '10137378459';

   QUERY PLAN

 --
  Aggregate  (cost=671.48..671.49 rows=1 width=0) (actual
 time=0.272..0.272 rows=1 loops=1)
-  Nested Loop  (cost=0.00..671.34 rows=54 width=0) (actual
 time=0.124..0.265 rows=16 loops=1)
  -  Index Scan using tb_line_item_tracking_number_key on
 tb_line_item li  (cost=0.00..219.17 rows=54 width=4) (actual
 time=0.087..0.161 rows=16 loops=1)
Index Cond: ((tracking_number)::text = '10137378459'::text)
  -  Index Scan using tb_order_location_pkey on tb_order_location
 ol  (cost=0.00..8.36 rows=1 width=4) (actual time=0.005..0.005 rows=1
 loops=16)
Index Cond: (order_location = li.order_location)
  Total runtime: 0.343 ms
 (7 rows)



 postgres@moshe=devmain:ises_coelacanth=# explain analyze SELECT count(*)
  FROM tb_order_location ol   JOIN tb_line_item li on li.order_location =
 ol.order_location   WHERE  li.tracking_number = '10137378459';

 QUERY PLAN

 ---
  Aggregate  (cost=50467.40..50467.41 rows=1 width=0) (actual
 time=333.490..333.491 rows=1 loops=1)
-  Hash Join  (cost=26551.11..50447.62 rows=7915 width=0) (actual
 time=332.045..333.481 rows=16 loops=1)
  Hash Cond: (li.order_location = ol.order_location)
  -  Bitmap Heap Scan on tb_line_item li  (cost=177.82..20715.03
 rows=7915 width=4) (actual time=0.128..0.209 rows=16 loops=1)
Recheck Cond: ((tracking_number)::text =
 '10137378459'::text)
-  Bitmap Index Scan on tb_line_item_tracking_number_key
  (cost=0.00..175.84 rows=7915 width=0) (actual time=0.108..0.108 rows=16
 loops=1)
  Index Cond: ((tracking_number)::text =
 '10137378459'::text)
  -  Hash  (cost=13190.24..13190.24 rows=803524 width=4) (actual
 time=324.114..324.114 rows=803553 loops=1)
Buckets: 4096  Batches: 32  Memory Usage: 887kB
-  Seq Scan on tb_order_location ol  (cost=0.00..13190.24
 rows=803524 width=4) (actual time=0.024..144.581 rows=803553 loops=1)
  Total runtime: 333.766 ms
 (11 rows)


Both of these queries return 16 rows, as you can see.

Below I've included the information on each of these tables. They have the
same indexes and are identical for the purposes of this query.

Can you help me figure out what is going on here?? Thank you!

 postgres@moshe=devmain:ises=# \d tb_line_item
Table public.tb_line_item
Column   |Type |
 Modifiers

 +-+---
  line_item  | integer | not null
 default nextval('sq_pk_line_item'::regclass)
 (...)
  order_location | integer | not null
 (...)
  tracking_number| character varying(512)  |
 (...)
 Indexes:
 tb_line_item_pkey PRIMARY KEY, btree (line_item)
 tb_line_item_order_catalog_article_key UNIQUE CONSTRAINT, btree
 (order_catalog_article, order_location, project, creator)
 tb_line_item_order_vendor_article_key UNIQUE CONSTRAINT, btree
 (order_vendor_article, order_location, project, creator)
 idx_line_item_canceled btree (canceled)
 ix_line_item_project btree (project)
 ix_line_item_reset btree (reset)
 tb_line_item_order_location_key btree (order_location)
 tb_line_item_tracking_number_key btree (tracking_number)
 Check constraints:
 chk_order_vendor_article_or_order_catalog_article CHECK
 (order_vendor_article IS NULL AND order_catalog_article IS NOT NULL OR
 order_vendor_article IS NOT NULL AND order_catalog_article IS NULL)
 tb_line_item_check CHECK (
 CASE
 WHEN executed IS NOT NULL AND canceled IS NOT NULL THEN false
 ELSE true
 END)
 tb_line_item_quantity_backordered_check CHECK (quantity_backordered
 = 0::numeric)
 tb_line_item_quantity_ordered_check CHECK (quantity_ordered 
 0::numeric)
 tb_line_item_unit_price_check CHECK (unit_price = 0::numeric)
 Foreign-key constraints:
 (...)
 

Re: [GENERAL] Can you spot the difference?

2013-04-16 Thread Adrian Klaver

On 04/16/2013 12:07 PM, Moshe Jacobson wrote:

Hi PostgreSQL friends,

I have two databases in the same cluster that are almost identical. One
is a copy of the other as we are developing some new features in the copy.

My problem is that the exact same simple query performs great in the
original database (ises) and dismally in the copy database
(ises_coelacanth). The problem is that in ises, it uses an index scan,
but in ises_coelacanth it uses a sequential scan:



The difference is that Postgres is coming to alternate conclusions as to 
what plan to use. Given that the copy is causing the 'problem', the 
question to ask is; did you run ANALYZE on the table once the data was 
copied in?






--
Moshe Jacobson
Nead Werx, Inc. | Manager of Systems Engineering
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
mo...@neadwerx.com mailto:mo...@neadwerx.com | www.neadwerx.com
http://www.neadwerx.com/

Quality is not an act, it is a habit. -- Aristotle



--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can you spot the difference?

2013-04-16 Thread Tom Lane
Moshe Jacobson mo...@neadwerx.com writes:
 My problem is that the exact same simple query performs great in the
 original database (ises) and dismally in the copy database
 (ises_coelacanth). The problem is that in ises, it uses an index scan,
 but in ises_coelacanth it uses a sequential scan:

The rowcount estimates are much further away from reality in the second
database.  Either you forgot to run ANALYZE at all, or the stats target
settings are different (and lower) in the second DB.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can you spot the difference?

2013-04-16 Thread Moshe Jacobson
That was it! Thanks Adrian and Tom!


On Tue, Apr 16, 2013 at 3:29 PM, Adrian Klaver adrian.kla...@gmail.comwrote:

 On 04/16/2013 12:07 PM, Moshe Jacobson wrote:

 Hi PostgreSQL friends,

 I have two databases in the same cluster that are almost identical. One
 is a copy of the other as we are developing some new features in the copy.

 My problem is that the exact same simple query performs great in the
 original database (ises) and dismally in the copy database
 (ises_coelacanth). The problem is that in ises, it uses an index scan,
 but in ises_coelacanth it uses a sequential scan:



 The difference is that Postgres is coming to alternate conclusions as to
 what plan to use. Given that the copy is causing the 'problem', the
 question to ask is; did you run ANALYZE on the table once the data was
 copied in?




 --
 Moshe Jacobson
 Nead Werx, Inc. | Manager of Systems Engineering
 2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
 mo...@neadwerx.com mailto:mo...@neadwerx.com | www.neadwerx.com
 http://www.neadwerx.com/


 Quality is not an act, it is a habit. -- Aristotle



 --
 Adrian Klaver
 adrian.kla...@gmail.com




-- 
Moshe Jacobson
Nead Werx, Inc. | Manager of Systems Engineering
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
mo...@neadwerx.com | www.neadwerx.com

Quality is not an act, it is a habit. -- Aristotle


Re: [GENERAL] Can you spot the difference?

2013-04-16 Thread Igor Neyman
Statistics on your original and copy databases must be different.
For the same condition (li.tracking_number = '10137378459') optimizer expects 
to find 7915 rows in tb_line_item table on the copy database while only 54 
rows on the original database.
Also, the other table (tb_order_location)  could have bad statistics as well.
That could cause different execution plans.

Regards,
Igor Neyman

From: Moshe Jacobson [mailto:mo...@neadwerx.com]
Sent: Tuesday, April 16, 2013 3:07 PM
To: pgsql-general
Subject: Can you spot the difference?

Hi PostgreSQL friends,

I have two databases in the same cluster that are almost identical. One is a 
copy of the other as we are developing some new features in the copy.

My problem is that the exact same simple query performs great in the original 
database (ises) and dismally in the copy database (ises_coelacanth). The 
problem is that in ises, it uses an index scan, but in ises_coelacanth it uses 
a sequential scan:

postgres@moshe=devmain:ises=# explain analyze SELECT count(*)  FROM 
tb_order_location ol   JOIN tb_line_item li on li.order_location = 
ol.order_location   WHERE  li.tracking_number = '10137378459';

QUERY PLAN
--
 Aggregate  (cost=671.48..671.49tel:671.48..671.49 rows=1 width=0) (actual 
time=0.272..0.272 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..671.34 rows=54 width=0) (actual 
time=0.124..0.265 rows=16 loops=1)
 -  Index Scan using tb_line_item_tracking_number_key on tb_line_item 
li  (cost=0.00..219.17 rows=54 width=4) (actual time=0.087..0.161 rows=16 
loops=1)
   Index Cond: ((tracking_number)::text = '10137378459'::text)
 -  Index Scan using tb_order_location_pkey on tb_order_location ol  
(cost=0.00..8.36 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=16)
   Index Cond: (order_location = li.order_location)
 Total runtime: 0.343 ms
(7 rows)

postgres@moshe=devmain:ises_coelacanth=# explain analyze SELECT count(*)  FROM 
tb_order_location ol   JOIN tb_line_item li on li.order_location = 
ol.order_location   WHERE  li.tracking_number = '10137378459';
QUERY 
PLAN
---
 Aggregate  (cost=50467.40..50467.41 rows=1 width=0) (actual 
time=333.490..333.491 rows=1 loops=1)
   -  Hash Join  (cost=26551.11..50447.62 rows=7915 width=0) (actual 
time=332.045..333.481 rows=16 loops=1)
 Hash Cond: (li.order_location = ol.order_location)
 -  Bitmap Heap Scan on tb_line_item li  (cost=177.82..20715.03 
rows=7915 width=4) (actual time=0.128..0.209 rows=16 loops=1)
   Recheck Cond: ((tracking_number)::text = '10137378459'::text)
   -  Bitmap Index Scan on tb_line_item_tracking_number_key  
(cost=0.00..175.84 rows=7915 width=0) (actual time=0.108..0.108 rows=16 loops=1)
 Index Cond: ((tracking_number)::text = '10137378459'::text)
 -  Hash  (cost=13190.24..13190.24 rows=803524 width=4) (actual 
time=324.114..324.114 rows=803553 loops=1)
   Buckets: 4096  Batches: 32  Memory Usage: 887kB
   -  Seq Scan on tb_order_location ol  (cost=0.00..13190.24 
rows=803524 width=4) (actual time=0.024..144.581 rows=803553 loops=1)
 Total runtime: 333.766 ms
(11 rows)

Both of these queries return 16 rows, as you can see.

Below I've included the information on each of these tables. They have the same 
indexes and are identical for the purposes of this query.

Can you help me figure out what is going on here?? Thank you!

postgres@moshe=devmain:ises=# \d tb_line_item
   Table public.tb_line_item
   Column   |Type | 
  Modifiers
+-+---
 line_item  | integer | not null default 
nextval('sq_pk_line_item'::regclass)
(...)
 order_location | integer | not null
(...)
 tracking_number| character varying(512)  |
(...)
Indexes:
tb_line_item_pkey PRIMARY KEY, btree (line_item)
tb_line_item_order_catalog_article_key UNIQUE CONSTRAINT, btree 
(order_catalog_article, order_location, project, creator)
tb_line_item_order_vendor_article_key UNIQUE CONSTRAINT, btree 
(order_vendor_article, order_location, project, creator)
idx_line_item_canceled btree (canceled)
ix_line_item_project btree (project)
ix_line_item_reset btree (reset)
tb_line_item_order_location_key btree (order_location

Re: [GENERAL] Can you spot the difference?

2013-04-16 Thread Moshe Jacobson
On Tue, Apr 16, 2013 at 3:29 PM, Adrian Klaver adrian.kla...@gmail.comwrote:

 Given that the copy is causing the 'problem', the question to ask is; did
 you run ANALYZE on the table once the data was copied in?


I did not -- I expected the autovacuum daemon to do so. Why did it not?
The database was created  restored days ago, and the autovacuum daemon is
running with default settings.

Thanks.

-- 
Moshe Jacobson
Nead Werx, Inc. | Manager of Systems Engineering
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
mo...@neadwerx.com | www.neadwerx.com

Quality is not an act, it is a habit. -- Aristotle


Re: [GENERAL] Can you spot the difference?

2013-04-16 Thread Adrian Klaver

On 04/16/2013 01:55 PM, Moshe Jacobson wrote:

On Tue, Apr 16, 2013 at 3:29 PM, Adrian Klaver adrian.kla...@gmail.com
mailto:adrian.kla...@gmail.com wrote:

Given that the copy is causing the 'problem', the question to ask
is; did you run ANALYZE on the table once the data was copied in?


I did not -- I expected the autovacuum daemon to do so. Why did it not?
The database was created  restored days ago, and the autovacuum daemon
is running with default settings.


http://www.postgresql.org/docs/9.2/static/routine-vacuuming.html


The autovacuum daemon, if enabled, will automatically issue ANALYZE 
commands whenever the content of a table has changed sufficiently. 
However, administrators might prefer to rely on manually-scheduled 
ANALYZE operations, particularly if it is known that update activity on 
a table will not affect the statistics of interesting columns. The 
daemon schedules ANALYZE strictly as a function of the number of rows 
inserted or updated; it has no knowledge of whether that will lead to 
meaningful statistical changes.



So at a guess there has not been enough churn on the table.



Thanks.

--
Moshe Jacobson
Nead Werx, Inc. | Manager of Systems Engineering
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
mo...@neadwerx.com mailto:mo...@neadwerx.com | www.neadwerx.com
http://www.neadwerx.com/

Quality is not an act, it is a habit. -- Aristotle



--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general