Re: [PERFORM] setting configuration values inside a stored proc

2011-05-13 Thread Andres Freund
Hi,

On Friday, May 13, 2011 01:10:19 AM Samuel Gendler wrote:
 I've got a stored proc that constructs some aggregation queries as strings
 and then executes them.  I'd like to be able to increase work_mem before
 running those queries.  If I set a new value for work_mem within the stored
 proc prior to executing my query string, will that actually have an impact
 on the query or is work_mem basically a constant once the outer statement
 that calls the stored proc has begun?  I'd just test, but it will take
 hours for me to grab a copy of production data and import into a new db
 host for testing. I've already started that process, but I'm betting I'll
 have an answer by the time it completes.  It's just the difference between
 modifying the application which calls the procs (and doing a full software
 release in order to do so or else waiting a month to go in the next
 release) vs modifying the procs themselves, which requires only db a
 update.
I would suggest doing ALTER FUNCTION blub(blarg) SET work_mem = '512MB';

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-13 Thread tv
 It's not synchronous at all.  The clients create a temporary file for
 the statistics collector and move on.  The actual statistics don't get
 updated until the statistics collector decides enough time has passed to
 bother, which defaults to at most every 500ms.

Really? I thought the clients send the updates using a socket, at least
that's what I see in backend/postmaster/pgstat.c (e.g. in
pgstat_send_bgwriter where the data are sent, and in PgstatCollectorMain
where it's read from the socket and applied).

But no matter how exactly this works, this kind of stats has nothing to do
with ANALYZe - it's asynchronously updated every time you run a query.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] How to avoid seq scans for joins between union-all views (test case included)

2011-05-13 Thread Fredrik Widlert
Hi everyone

We have recently started to port an application from Oracle to PostgreSQL.
So far, we are amazed with how great most things work.

However, we have run into performance problems in one type of query which
is quite common in our application. We have created a (simplified)
reproducible test case which (hopefully!) creates all necessary tables
and data to
show the problem.

Plain-text description of the data model in the test case:

We have a set of objects (like electrical cables), each having
two nodes in the table connections (think of these two rows together
as an edge in a graph).

Another table connections_locked contains rows for some of
the same objects, which are locked by a long transaction.

The view connections_v performs a union all of the rows from
connections which are not modified in the current long
transaction with the rows from connections_locked which
are modified in the current long transaction.

Goal:
Given an object id, we want to find all neighbors for this
object (that is, objects which share a node with this object).

Problem:
We think that our query used to find neighbors would benefit
greatly from using some of our indexes, but we fail to make it
do so.


Over to the actual test case:

--

-- Tested on (from select version ()):
-- PostgreSQL 9.0.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit
-- PostgreSQL 9.1beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit

-- Ubuntu 11.04, uname -a output:
-- Linux hostname 2.6.38-8-generic-pae #42-Ubuntu SMP Mon Apr 11
05:17:09 UTC 2011 i686 i686 i386 GNU/Linux
-- Processor: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
-- Drive: Intel X25-M SSD


drop table if exists connections cascade;
drop table if exists connections_locked cascade;


create table connections (
  con_id serial primary key,
  locked_by integer not null,
  obj_id integer not null,
  node integer not null
);


-- create test nodes, two per obj_id
insert into connections (locked_by, obj_id, node)
select 0, n/2, 1000 + (n + 1)/2 from generate_series (1,50) as n;

create index connections_node_idx on connections (node);
create index connections_obj_idx on connections (obj_id);
vacuum analyze connections;



create table connections_locked (
  con_id integer not null,
  locked_by integer not null,
  obj_id integer not null,
  node integer not null,
  constraint locked_pk primary key (con_id, locked_by)
);

-- mark a few of the objects as locked by a long transaction
insert into connections_locked (con_id, locked_by, obj_id, node)
select n, 1 + n/50, n/2, 1000 + (n + 1)/2 from generate_series (1,25000) as n;

create index connections_locked_node_idx on connections_locked (node);
create index connections_locked_obj_idx on connections_locked (obj_id);
vacuum analyze connections_locked;


-- Create a view showing the world as seen by long transaction 4711.
-- In real life, this uses a session variable instead of a hard-coded value.
create or replace view connections_v as
select * from connections where locked_by  4711
union all
select * from connections_locked where locked_by = 4711;


-- This is the query we are trying to optimize.
-- We expect this to be able to use our indexes, but instead get
sequential scans
explain analyze
select
 con2.obj_id
from
 connections_v con1,
 connections_v con2
where
 con1.obj_id = 17 and
 con2.node = con1.node
;


-- Output:
-- Hash Join  (cost=16.69..16368.89 rows=7501 width=4) (actual
time=0.096..778.830 rows=4 loops=1)
--   Hash Cond: (*SELECT* 1.node = *SELECT* 1.node)
--   -  Append  (cost=0.00..14402.00 rows=500050 width=8) (actual
time=0.011..640.163 rows=50 loops=1)
-- -  Subquery Scan on *SELECT* 1  (cost=0.00..13953.00
rows=50 width=8) (actual time=0.011..430.645 rows=50 loops=1)
--   -  Seq Scan on connections  (cost=0.00..8953.00
rows=50 width=16) (actual time=0.009..178.535 rows=50 loops=1)
-- Filter: (locked_by  4711)
-- -  Subquery Scan on *SELECT* 2  (cost=0.00..449.00
rows=50 width=8) (actual time=3.254..3.254 rows=0 loops=1)
--   -  Seq Scan on connections_locked
(cost=0.00..448.50 rows=50 width=16) (actual time=3.253..3.253 rows=0
loops=1)
-- Filter: (locked_by = 4711)
--   -  Hash  (cost=16.66..16.66 rows=3 width=4) (actual
time=0.028..0.028 rows=2 loops=1)
-- Buckets: 1024  Batches: 1  Memory Usage: 1kB
-- -  Append  (cost=0.00..16.66 rows=3 width=4) (actual
time=0.013..0.025 rows=2 loops=1)
--   -  Subquery Scan on *SELECT* 1  (cost=0.00..8.35
rows=2 width=4) (actual time=0.013..0.016 rows=2 loops=1)
-- -  Index Scan using connections_obj_idx on
connections  (cost=0.00..8.33 rows=2 width=16) (actual
time=0.012..0.014 rows=2 loops=1)
--   Index Cond: (obj_id = 17)
--  

Re: [PERFORM] How to avoid seq scans for joins between union-all views (test case included)

2011-05-13 Thread Denis de Bernardy
I might have misread, but:

 select * from connections where locked_by  4711
 union all
 select * from connections_locked where locked_by = 4711;


The first part will result in a seq scan irrespective of indexes, and the 
second has no index on locked_by. The best you can do is to eliminate the seq 
scan on the second by adding the missing index on locked_by.

That said, note that index usage depends on your data distribution: postgres 
may identify that it'll read most/all of the table anyway, and opt to do a 
(cheaper) seq scan instead.

D


- Original Message -
 From: Fredrik Widlert fredrik.widl...@digpro.se
 To: pgsql-performance@postgresql.org
 Cc: 
 Sent: Friday, May 13, 2011 1:55 PM
 Subject: [PERFORM] How to avoid seq scans for joins between union-all views 
 (test case included)
 
 Hi everyone
 
 We have recently started to port an application from Oracle to PostgreSQL.
 So far, we are amazed with how great most things work.
 
 However, we have run into performance problems in one type of query which
 is quite common in our application. We have created a (simplified)
 reproducible test case which (hopefully!) creates all necessary tables
 and data to
 show the problem.
 
 Plain-text description of the data model in the test case:
 
 We have a set of objects (like electrical cables), each having
 two nodes in the table connections (think of these two rows together
 as an edge in a graph).
 
 Another table connections_locked contains rows for some of
 the same objects, which are locked by a long transaction.
 
 The view connections_v performs a union all of the rows from
 connections which are not modified in the current long
 transaction with the rows from connections_locked which
 are modified in the current long transaction.
 
 Goal:
 Given an object id, we want to find all neighbors for this
 object (that is, objects which share a node with this object).
 
 Problem:
 We think that our query used to find neighbors would benefit
 greatly from using some of our indexes, but we fail to make it
 do so.
 
 
 Over to the actual test case:
 
 --
 
 -- Tested on (from select version ()):
 -- PostgreSQL 9.0.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit
 -- PostgreSQL 9.1beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit
 
 -- Ubuntu 11.04, uname -a output:
 -- Linux hostname 2.6.38-8-generic-pae #42-Ubuntu SMP Mon Apr 11
 05:17:09 UTC 2011 i686 i686 i386 GNU/Linux
 -- Processor: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
 -- Drive: Intel X25-M SSD
 
 
 drop table if exists connections cascade;
 drop table if exists connections_locked cascade;
 
 
 create table connections (
   con_id serial primary key,
   locked_by integer not null,
   obj_id integer not null,
   node integer not null
 );
 
 
 -- create test nodes, two per obj_id
 insert into connections (locked_by, obj_id, node)
 select 0, n/2, 1000 + (n + 1)/2 from generate_series (1,50) as n;
 
 create index connections_node_idx on connections (node);
 create index connections_obj_idx on connections (obj_id);
 vacuum analyze connections;
 
 
 
 create table connections_locked (
   con_id integer not null,
   locked_by integer not null,
   obj_id integer not null,
   node integer not null,
   constraint locked_pk primary key (con_id, locked_by)
 );
 
 -- mark a few of the objects as locked by a long transaction
 insert into connections_locked (con_id, locked_by, obj_id, node)
 select n, 1 + n/50, n/2, 1000 + (n + 1)/2 from generate_series (1,25000) as n;
 
 create index connections_locked_node_idx on connections_locked (node);
 create index connections_locked_obj_idx on connections_locked (obj_id);
 vacuum analyze connections_locked;
 
 
 -- Create a view showing the world as seen by long transaction 4711.
 -- In real life, this uses a session variable instead of a hard-coded value.
 create or replace view connections_v as
 select * from connections where locked_by  4711
 union all
 select * from connections_locked where locked_by = 4711;
 
 
 -- This is the query we are trying to optimize.
 -- We expect this to be able to use our indexes, but instead get
 sequential scans
 explain analyze
 select
      con2.obj_id
 from
      connections_v con1,
      connections_v con2
 where
      con1.obj_id = 17 and
      con2.node = con1.node
 ;
 
 
 -- Output:
 -- Hash Join  (cost=16.69..16368.89 rows=7501 width=4) (actual
 time=0.096..778.830 rows=4 loops=1)
 --   Hash Cond: (*SELECT* 1.node = *SELECT* 1.node)
 --   -  Append  (cost=0.00..14402.00 rows=500050 width=8) (actual
 time=0.011..640.163 rows=50 loops=1)
 --         -  Subquery Scan on *SELECT* 1  (cost=0.00..13953.00
 rows=50 width=8) (actual time=0.011..430.645 rows=50 loops=1)
 --               -  Seq Scan on connections  (cost=0.00..8953.00
 rows=50 width=16) (actual time=0.009..178.535 rows=50 loops=1)
 --                   

Re: [PERFORM] How to avoid seq scans for joins between union-all views (test case included)

2011-05-13 Thread Cédric Villemain
2011/5/13 Denis de Bernardy ddeberna...@yahoo.com:
 I might have misread, but:

 select * from connections where locked_by  4711
 union all
 select * from connections_locked where locked_by = 4711;


 The first part will result in a seq scan irrespective of indexes, and the 
 second has no index on locked_by. The best you can do is to eliminate the seq 
 scan on the second by adding the missing index on locked_by.

just rework the primary key to set the locked_id first should work.


 That said, note that index usage depends on your data distribution: postgres 
 may identify that it'll read most/all of the table anyway, and opt to do a 
 (cheaper) seq scan instead.


Fredrick, What indexes Oracle did choose ? (index-only scan ?)


 D


 - Original Message -
 From: Fredrik Widlert fredrik.widl...@digpro.se
 To: pgsql-performance@postgresql.org
 Cc:
 Sent: Friday, May 13, 2011 1:55 PM
 Subject: [PERFORM] How to avoid seq scans for joins between union-all views 
 (test case included)

 Hi everyone

 We have recently started to port an application from Oracle to PostgreSQL.
 So far, we are amazed with how great most things work.

 However, we have run into performance problems in one type of query which
 is quite common in our application. We have created a (simplified)
 reproducible test case which (hopefully!) creates all necessary tables
 and data to
 show the problem.

 Plain-text description of the data model in the test case:

 We have a set of objects (like electrical cables), each having
 two nodes in the table connections (think of these two rows together
 as an edge in a graph).

 Another table connections_locked contains rows for some of
 the same objects, which are locked by a long transaction.

 The view connections_v performs a union all of the rows from
 connections which are not modified in the current long
 transaction with the rows from connections_locked which
 are modified in the current long transaction.

 Goal:
 Given an object id, we want to find all neighbors for this
 object (that is, objects which share a node with this object).

 Problem:
 We think that our query used to find neighbors would benefit
 greatly from using some of our indexes, but we fail to make it
 do so.


 Over to the actual test case:

 --

 -- Tested on (from select version ()):
 -- PostgreSQL 9.0.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit
 -- PostgreSQL 9.1beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit

 -- Ubuntu 11.04, uname -a output:
 -- Linux hostname 2.6.38-8-generic-pae #42-Ubuntu SMP Mon Apr 11
 05:17:09 UTC 2011 i686 i686 i386 GNU/Linux
 -- Processor: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
 -- Drive: Intel X25-M SSD


 drop table if exists connections cascade;
 drop table if exists connections_locked cascade;


 create table connections (
   con_id serial primary key,
   locked_by integer not null,
   obj_id integer not null,
   node integer not null
 );


 -- create test nodes, two per obj_id
 insert into connections (locked_by, obj_id, node)
 select 0, n/2, 1000 + (n + 1)/2 from generate_series (1,50) as n;

 create index connections_node_idx on connections (node);
 create index connections_obj_idx on connections (obj_id);
 vacuum analyze connections;



 create table connections_locked (
   con_id integer not null,
   locked_by integer not null,
   obj_id integer not null,
   node integer not null,
   constraint locked_pk primary key (con_id, locked_by)
 );

 -- mark a few of the objects as locked by a long transaction
 insert into connections_locked (con_id, locked_by, obj_id, node)
 select n, 1 + n/50, n/2, 1000 + (n + 1)/2 from generate_series (1,25000) as 
 n;

 create index connections_locked_node_idx on connections_locked (node);
 create index connections_locked_obj_idx on connections_locked (obj_id);
 vacuum analyze connections_locked;


 -- Create a view showing the world as seen by long transaction 4711.
 -- In real life, this uses a session variable instead of a hard-coded value.
 create or replace view connections_v as
 select * from connections where locked_by  4711
 union all
 select * from connections_locked where locked_by = 4711;


 -- This is the query we are trying to optimize.
 -- We expect this to be able to use our indexes, but instead get
 sequential scans
 explain analyze
 select
      con2.obj_id
 from
      connections_v con1,
      connections_v con2
 where
      con1.obj_id = 17 and
      con2.node = con1.node
 ;


 -- Output:
 -- Hash Join  (cost=16.69..16368.89 rows=7501 width=4) (actual
 time=0.096..778.830 rows=4 loops=1)
 --   Hash Cond: (*SELECT* 1.node = *SELECT* 1.node)
 --   -  Append  (cost=0.00..14402.00 rows=500050 width=8) (actual
 time=0.011..640.163 rows=50 loops=1)
 --         -  Subquery Scan on *SELECT* 1  (cost=0.00..13953.00
 rows=50 width=8) (actual time=0.011..430.645 rows=50 

Re: [PERFORM] How to avoid seq scans for joins between union-all views (test case included)

2011-05-13 Thread Fredrik Widlert
Hi Denis and Cédric

Thanks for your answers.

 Fredrick, What indexes Oracle did choose ? (index-only scan ?)

Oracle chooses a plan which looks like this:
SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=7 Bytes=182)
  VIEW OF 'CONNECTIONS_V' (VIEW) (Cost=5 Card=7 Bytes=182)
UNION-ALL
  INLIST ITERATOR
TABLE ACCESS (BY INDEX ROWID) OF 'CONNECTIONS' (TABLE) (Cost=5
Card=6 Bytes=54)
  INDEX (RANGE SCAN) OF 'CONNECTIONS_NODE_IDX' (INDEX) (Cost=4 Card=6)
  INLIST ITERATOR
TABLE ACCESS (BY INDEX ROWID) OF 'CONNECTIONS_LOCKED' (TABLE)
(Cost=0 Card=1 Bytes=39)
  INDEX (RANGE SCAN) OF 'CONNECTIONS_LOCKED_NODE_IDX' (INDEX)
(Cost=0 Card=1)

This means that only the indexes of connections.node and
connections_locked.node are used.

I don't think that we want to use any index for locked_by here,
we are hoping for the node = value predicate to be pushed
into both halves of the union all view (not sure if this is the right
terminology).

For example, in the simplified-but-still-problematic query
select con2.obj_id from  connections_v con2 where con2.node in (select 1015);
we are hoping for the node-index to be used for both connections and
connections_locked.

We hope to get the same plan/performance as for this query:
select con2.obj_id from connections_v con2 where con2.node in (1015);
I don't understand why there is a difference between in (select
1015) and in (1015)?

 That said, note that index usage depends on your data distribution: postgres
 may identify that it'll read most/all of the table anyway, and opt to do a
 (cheaper) seq scan instead.

Yes, I know, but I've tried to create the test case data distribution in a way
I hope makes this unlikely (0.5 million rows in one table, 25000 in the
other table, two rows in each table for each distinct value of node, only
a few rows returned from the queries.

Thanks again for you answers so far
/Fredrik

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Link error when use Pgtypes function in windows

2011-05-13 Thread Fanbin Meng
Hi:

I installed PostgreSQL9.0 from EnterpriseDB with“one click installer”  in
windows 7  32bit.

and use microsoft visual studio 2010 c++.

I added the libpq.lib to the link property of the project, also included the
lib folder and path.

Successfully compiled .c and .cpp file after  transfer .pgc file to .c file
using ECPG.

But it always have errors like this when link:

error LNK2019: unresolved external symbol _PGTYPESinterval_new referenced in
function.

And all the function PGTYPEStimestamp  and PGTYPEsinterval can cause the
same error.

Does someone can help me?

Thanks.

Fanbin


Re: [PERFORM] Link error when use Pgtypes function in windows

2011-05-13 Thread Maciek Sakrejda
 Does someone can help me?

You may want to try pgsql-general instead of this list.
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas
On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Apr 26, 2011 at 4:37 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Sok Ann Yap sok...@gmail.com wrote:

 So, index scan wins by a very small margin over sequential scan
 after the tuning. I am a bit puzzled because index scan is more
 than 3000 times faster in this case, but the estimated costs are
 about the same. Did I do something wrong?

 Tuning is generally needed to get best performance from PostgreSQL.
 Needing to reduce random_page_cost is not unusual in situations
 where a good portion of the active data is in cache (between
 shared_buffers and the OS cache).  Please show us your overall
 configuration and give a description of the hardware (how many of
 what kind of cores, how much RAM, what sort of storage system).  The
 configuration part can be obtained by running the query on this page
 and pasting the result into your next post:

 http://wiki.postgresql.org/wiki/Server_Configuration

 There are probably some other configuration adjustments you could do
 to ensure that good plans are chosen.

 The very first thing to check is effective_cache_size and to set it to
 a reasonable value.

Actually, effective_cache_size has no impact on costing except when
planning a nested loop with inner index scan.  So, a query against a
single table can never benefit from changing that setting.  Kevin's
suggestion of adjusting seq_page_cost and random_page_cost is the way
to go.

We've talked in the past (and I still think it's a good idea, but
haven't gotten around to doing anything about it) about adjusting the
planner to attribute to each relation the percentage of its pages
which we believe we'll find in cache.  Although many complicated ideas
for determining that percentage have been proposed, my favorite one is
fairly simple: assume that small relations will be mostly or entirely
cached, and that big ones won't be.   Allow the administrator to
override the result on a per-relation basis.  It's difficult to
imagine a situation where the planner should assume that a relation
with only handful of pages isn't going to be cached.  Even if it
isn't, as soon as someone begins accessing it, it will be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 We've talked in the past (and I still think it's a good idea, but
 haven't gotten around to doing anything about it) about adjusting
 the planner to attribute to each relation the percentage of its
 pages which we believe we'll find in cache.  Although many
 complicated ideas for determining that percentage have been
 proposed, my favorite one is fairly simple: assume that small
 relations will be mostly or entirely cached, and that big ones
 won't be.   Allow the administrator to override the result on a
 per-relation basis.  It's difficult to imagine a situation where
 the planner should assume that a relation with only handful of
 pages isn't going to be cached.  Even if it isn't, as soon as
 someone begins accessing it, it will be.
 
Simple as the heuristic is, I bet it would be effective.  While one
can easily construct a synthetic case where it falls down, the ones
I can think of aren't all that common, and you are suggesting an
override mechanism.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move

2011-05-13 Thread Robert Haas
On Wed, May 4, 2011 at 6:31 AM, Willy-Bas Loos willy...@gmail.com wrote:
 I'm asking them for (real) benchmarks, thanks for the advice. (fio is not
 available for us now to do it myself, grmbl)
 It just occurred to me that it is not necessarily the case that reading the
 indexes causes a lot of random I/O (on the index itself).
 I mean, maybe the index is generally read sequentially and then, when
 retrieving the data, there is a lot of random I/O.
 if it's a long story, any tips for info about this (book or web site)?

If you don't do anything special, and if the query plan says Index
Scan rather than Bitmap Index Scan, then both the index I/O and the
table I/O are likely to be fairly random.  However there are a number
of cases in which you can expect the table I/O to be sequential:

- In some cases, you may happen to insert rows with an ordering that
matches the index.  For example, if you have a table with not too many
updates and deletes, and an index on a serial column, then new rows
will have a higher value in that column than old rows, and will also
typically be physically after older rows in the file.  Or you might be
inserting timestamped data from oldest to newest.
- If the planner chooses a Bitmap Index Scan, it effectively scans the
index to figure out which table blocks to read, and then reads those
table blocks in block number order, so that the I/O is sequential,
with skips.
- If you CLUSTER the table on a particular index, it will be
physically ordered to match the index's key ordering.  As the table is
further modified the degree of clustering will gradually decline;
eventually you may wish to re-CLUSTER.

It's also worth keeping in mind that the index itself won't
necessarily be accessed in physically sequential order.  The point of
the index is to emit the rows in key order, but if the table is
heavily updated, it won't necessarily be the case that a page
containing lower-valued keys physically precedes a page containing
higher-valued keys.  I'm actually somewhat fuzzy on how this works,
and to what extent it's a problem in practice, but I am fairly sure it
can happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote:
 The very first thing to check is effective_cache_size and to set it to
 a reasonable value.

 Actually, effective_cache_size has no impact on costing except when
 planning a nested loop with inner index scan.  So, a query against a
 single table can never benefit from changing that setting.

That's flat out wrong.  It does affect the cost estimate for plain
indexscan (and bitmap indexscan) plans.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas
On Fri, May 13, 2011 at 3:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote:
 The very first thing to check is effective_cache_size and to set it to
 a reasonable value.

 Actually, effective_cache_size has no impact on costing except when
 planning a nested loop with inner index scan.  So, a query against a
 single table can never benefit from changing that setting.

 That's flat out wrong.  It does affect the cost estimate for plain
 indexscan (and bitmap indexscan) plans.

rereads code

OK, I agree.  I obviously misinterpreted this code the last time I read it.

I guess maybe the reason why it didn't matter for the OP is that - if
the size of the index page in pages is smaller than the pro-rated
fraction of effective_cache_size allowed to the index - then the exact
value doesn't affect the answer.

I apparently need to study this code more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Josh Berkus

 I guess maybe the reason why it didn't matter for the OP is that - if
 the size of the index page in pages is smaller than the pro-rated
 fraction of effective_cache_size allowed to the index - then the exact
 value doesn't affect the answer.
 
 I apparently need to study this code more.

FWIW: random_page_cost is meant to be the ratio between the cost of
looking up a single row as and index lookup, and the cost of looking up
that same row as part of a larger sequential scan.  For specific
storage, that coefficient should be roughly the same regardless of the
table size.  So if your plan for optimization involves manipulating RPC
for anything other than a change of storage, you're Doing It Wrong.

Instead, we should be fixing the formulas these are based on and leaving
RPC alone.

For any data page, there are actually four costs associated with each
tuple lookup, per:

in-memory/seq   | on disk/seq
+
in-memory/random| on disk/random

(yes, there's actually more for bitmapscan etc.  but the example holds)

For any given tuple lookup, then, you can assign a cost based on where
you think that tuple falls in that quadrant map.  Since this is all
probability-based, you'd be assigning a cost as a mixed % of in-memory
and on-disk costs.  Improvements in accuracy of this formula would come
through improvements in accuracy in predicting if a particular data page
will be in memory.

This is what the combination of random_page_cost and
effective_cache_size ought to supply, but I don't think it does, quite.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Cédric Villemain
2011/5/13 Josh Berkus j...@agliodbs.com:

 I guess maybe the reason why it didn't matter for the OP is that - if
 the size of the index page in pages is smaller than the pro-rated
 fraction of effective_cache_size allowed to the index - then the exact
 value doesn't affect the answer.

 I apparently need to study this code more.

 FWIW: random_page_cost is meant to be the ratio between the cost of
 looking up a single row as and index lookup, and the cost of looking up
 that same row as part of a larger sequential scan.  For specific
 storage, that coefficient should be roughly the same regardless of the
 table size.  So if your plan for optimization involves manipulating RPC
 for anything other than a change of storage, you're Doing It Wrong.

 Instead, we should be fixing the formulas these are based on and leaving
 RPC alone.

 For any data page, there are actually four costs associated with each
 tuple lookup, per:

 in-memory/seq   | on disk/seq
 +
 in-memory/random| on disk/random

it lacks some more theorical like sort_page/temp_page : those are
based on a ratio of seq_page_cost and random_page_cost or a simple
seq_page_cost (when working out of work_mem)

memory access is accounted with some 0.1 in some place AFAIR.
(and memory random/seq is the same at the level of estimations we do)


 (yes, there's actually more for bitmapscan etc.  but the example holds)

(if I read correctly the sources, for this one there is a linear
approach to ponderate the cost between random_page cost and
seq_page_cost on the heap page fetch plus the Mackert and Lohman
formula, if needed, in its best usage : predicting what should be in
cache *because* of the current query execution, not because of the
current status of the page cache)


 For any given tuple lookup, then, you can assign a cost based on where
 you think that tuple falls in that quadrant map.  Since this is all
 probability-based, you'd be assigning a cost as a mixed % of in-memory
 and on-disk costs.  Improvements in accuracy of this formula would come
 through improvements in accuracy in predicting if a particular data page
 will be in memory.

 This is what the combination of random_page_cost and
 effective_cache_size ought to supply, but I don't think it does, quite.

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Using pgiosim realistically

2011-05-13 Thread John Rouillard
Hi all:

I am adding pgiosim to our testing for new database hardware and I am
seeing something I don't quite get and I think it's because I am using
pgiosim incorrectly.

Specs:

  OS: centos 5.5 kernel: 2.6.18-194.32.1.el5
  memory: 96GB
  cpu: 2x Intel(R) Xeon(R) X5690  @ 3.47GHz (6 core, ht enabled)
  disks: WD2003FYYS RE4
  raid: lsi - 9260-4i with 8 disks in raid 10 configuration
  1MB stripe size
  raid cache enabled w/ bbu
  disk caches disabled
  filesystem: ext3 created with -E stride=256

I am seeing really poor (70) iops with pgiosim.  According to:
http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the
database benchmark they are seeing ~170 iops on a single disk for
these drives. I would expect an 8 disk raid 10 should get better then
3x the single disk rate (assuming the data is randomly distributed).

To test I am using 5 100GB files with

sudo ~/pgiosim -c -b 100G -v file?

I am using 100G sizes to make sure that the data read and files sizes
exceed the memory size of the system.

However if I use 5 1GB files (and still 100GB read data) I see 200+ to
400+ iops at 50% of the 100GB of data read, which I assume means that
the data is cached in the OS cache and I am not really getting hard
drive/raid I/O measurement of iops.

However, IIUC postgres will never have an index file greater than 1GB
in size
(http://www.postgresql.org/docs/8.4/static/storage-file-layout.html)
and will just add 1GB segments, so the 1GB size files seems to be more
realistic.

So do I want 100 (or probably 2 or 3 times more say 300) 1GB files to
feed pgiosim? That way I will have enough data that not all of it can
be cached in memory and the file sizes (and file operations:
open/close) more closely match what postgres is doing with index
files?

Also in the output of pgiosim I see:

  25.17%,   2881 read,  0 written, 2304.56kB/sec  288.07 iops

which I interpret (left to right) as the % of the 100GB that has been
read, the number of read operations over some time period, number of
bytes read/written and the io operations/sec. Iops always seems to be
1/10th of the read number (rounded up to an integer). Is this
expected and if so anybody know why?

While this is running if I also run iostat -p /dev/sdc 5 I see:

  Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sdc 166.40  2652.80 4.80  13264 24
  sdc1   2818.80 1.20   999.20  6   4996

which I am interpreting as 2818 read/io operations (corresponding more
or less to read in the pgiosim output) to the partition and of those
only 116 are actually going to the drive??? with the rest handled from
OS cache.

However the tps isn't increasing when I see pgiosim reporting:

   48.47%,   4610 read,  0 written, 3687.62kB/sec  460.95 iops

an iostat 5 output near the same time is reporting:

  Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sdc 165.87  2647.50 4.79  13264 24
  sdc1   2812.97 0.60   995.41  3   4987

so I am not sure if there is a correlation between the read and tps
settings.

Also I am assuming blks written is filesystem metadata although that
seems like a lot of data 

If I stop the pgiosim, the iostat drops to 0 write and reads as
expected.

So does anybody have any comments on how to test with pgiosim and how
to correlate the iostat and pgiosim outputs?

Thanks for your feedback.
-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas
On Fri, May 13, 2011 at 4:13 PM, Josh Berkus j...@agliodbs.com wrote:
 Instead, we should be fixing the formulas these are based on and leaving
 RPC alone.

 For any data page, there are actually four costs associated with each
 tuple lookup, per:

All true.  I suspect that in practice the different between random and
sequential memory page costs is small enough to be ignorable, although
of course I might be wrong.  I've never seen a database that was fully
cached in memory where it was necessary to set
random_page_costseq_page_cost to get good plans -- no doubt partly
because even if the pages were consecutive on disk, there's no reason
to suppose they would be so in memory, and we certainly wouldn't know
one way or the other at planning time.   But I agree we should add a
cached_page_cost as part of all this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query improvement

2011-05-13 Thread Robert Haas
On Mon, May 2, 2011 at 3:58 AM, Claudio Freire klaussfre...@gmail.com wrote:
 Hash joins are very inefficient if they require big temporary files.

Hmm, that's not been my experience.  What have you seen?

I've seen a 64-batch hash join beat out a
nested-loop-with-inner-indexscan, which I never woulda believed,
but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance