Re: [PERFORM] setting configuration values inside a stored proc
Hi, On Friday, May 13, 2011 01:10:19 AM Samuel Gendler wrote: I've got a stored proc that constructs some aggregation queries as strings and then executes them. I'd like to be able to increase work_mem before running those queries. If I set a new value for work_mem within the stored proc prior to executing my query string, will that actually have an impact on the query or is work_mem basically a constant once the outer statement that calls the stored proc has begun? I'd just test, but it will take hours for me to grab a copy of production data and import into a new db host for testing. I've already started that process, but I'm betting I'll have an answer by the time it completes. It's just the difference between modifying the application which calls the procs (and doing a full software release in order to do so or else waiting a month to go in the next release) vs modifying the procs themselves, which requires only db a update. I would suggest doing ALTER FUNCTION blub(blarg) SET work_mem = '512MB'; Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?
It's not synchronous at all. The clients create a temporary file for the statistics collector and move on. The actual statistics don't get updated until the statistics collector decides enough time has passed to bother, which defaults to at most every 500ms. Really? I thought the clients send the updates using a socket, at least that's what I see in backend/postmaster/pgstat.c (e.g. in pgstat_send_bgwriter where the data are sent, and in PgstatCollectorMain where it's read from the socket and applied). But no matter how exactly this works, this kind of stats has nothing to do with ANALYZe - it's asynchronously updated every time you run a query. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How to avoid seq scans for joins between union-all views (test case included)
Hi everyone We have recently started to port an application from Oracle to PostgreSQL. So far, we are amazed with how great most things work. However, we have run into performance problems in one type of query which is quite common in our application. We have created a (simplified) reproducible test case which (hopefully!) creates all necessary tables and data to show the problem. Plain-text description of the data model in the test case: We have a set of objects (like electrical cables), each having two nodes in the table connections (think of these two rows together as an edge in a graph). Another table connections_locked contains rows for some of the same objects, which are locked by a long transaction. The view connections_v performs a union all of the rows from connections which are not modified in the current long transaction with the rows from connections_locked which are modified in the current long transaction. Goal: Given an object id, we want to find all neighbors for this object (that is, objects which share a node with this object). Problem: We think that our query used to find neighbors would benefit greatly from using some of our indexes, but we fail to make it do so. Over to the actual test case: -- -- Tested on (from select version ()): -- PostgreSQL 9.0.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit -- PostgreSQL 9.1beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit -- Ubuntu 11.04, uname -a output: -- Linux hostname 2.6.38-8-generic-pae #42-Ubuntu SMP Mon Apr 11 05:17:09 UTC 2011 i686 i686 i386 GNU/Linux -- Processor: Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz -- Drive: Intel X25-M SSD drop table if exists connections cascade; drop table if exists connections_locked cascade; create table connections ( con_id serial primary key, locked_by integer not null, obj_id integer not null, node integer not null ); -- create test nodes, two per obj_id insert into connections (locked_by, obj_id, node) select 0, n/2, 1000 + (n + 1)/2 from generate_series (1,50) as n; create index connections_node_idx on connections (node); create index connections_obj_idx on connections (obj_id); vacuum analyze connections; create table connections_locked ( con_id integer not null, locked_by integer not null, obj_id integer not null, node integer not null, constraint locked_pk primary key (con_id, locked_by) ); -- mark a few of the objects as locked by a long transaction insert into connections_locked (con_id, locked_by, obj_id, node) select n, 1 + n/50, n/2, 1000 + (n + 1)/2 from generate_series (1,25000) as n; create index connections_locked_node_idx on connections_locked (node); create index connections_locked_obj_idx on connections_locked (obj_id); vacuum analyze connections_locked; -- Create a view showing the world as seen by long transaction 4711. -- In real life, this uses a session variable instead of a hard-coded value. create or replace view connections_v as select * from connections where locked_by 4711 union all select * from connections_locked where locked_by = 4711; -- This is the query we are trying to optimize. -- We expect this to be able to use our indexes, but instead get sequential scans explain analyze select con2.obj_id from connections_v con1, connections_v con2 where con1.obj_id = 17 and con2.node = con1.node ; -- Output: -- Hash Join (cost=16.69..16368.89 rows=7501 width=4) (actual time=0.096..778.830 rows=4 loops=1) -- Hash Cond: (*SELECT* 1.node = *SELECT* 1.node) -- - Append (cost=0.00..14402.00 rows=500050 width=8) (actual time=0.011..640.163 rows=50 loops=1) -- - Subquery Scan on *SELECT* 1 (cost=0.00..13953.00 rows=50 width=8) (actual time=0.011..430.645 rows=50 loops=1) -- - Seq Scan on connections (cost=0.00..8953.00 rows=50 width=16) (actual time=0.009..178.535 rows=50 loops=1) -- Filter: (locked_by 4711) -- - Subquery Scan on *SELECT* 2 (cost=0.00..449.00 rows=50 width=8) (actual time=3.254..3.254 rows=0 loops=1) -- - Seq Scan on connections_locked (cost=0.00..448.50 rows=50 width=16) (actual time=3.253..3.253 rows=0 loops=1) -- Filter: (locked_by = 4711) -- - Hash (cost=16.66..16.66 rows=3 width=4) (actual time=0.028..0.028 rows=2 loops=1) -- Buckets: 1024 Batches: 1 Memory Usage: 1kB -- - Append (cost=0.00..16.66 rows=3 width=4) (actual time=0.013..0.025 rows=2 loops=1) -- - Subquery Scan on *SELECT* 1 (cost=0.00..8.35 rows=2 width=4) (actual time=0.013..0.016 rows=2 loops=1) -- - Index Scan using connections_obj_idx on connections (cost=0.00..8.33 rows=2 width=16) (actual time=0.012..0.014 rows=2 loops=1) -- Index Cond: (obj_id = 17) --
Re: [PERFORM] How to avoid seq scans for joins between union-all views (test case included)
I might have misread, but: select * from connections where locked_by 4711 union all select * from connections_locked where locked_by = 4711; The first part will result in a seq scan irrespective of indexes, and the second has no index on locked_by. The best you can do is to eliminate the seq scan on the second by adding the missing index on locked_by. That said, note that index usage depends on your data distribution: postgres may identify that it'll read most/all of the table anyway, and opt to do a (cheaper) seq scan instead. D - Original Message - From: Fredrik Widlert fredrik.widl...@digpro.se To: pgsql-performance@postgresql.org Cc: Sent: Friday, May 13, 2011 1:55 PM Subject: [PERFORM] How to avoid seq scans for joins between union-all views (test case included) Hi everyone We have recently started to port an application from Oracle to PostgreSQL. So far, we are amazed with how great most things work. However, we have run into performance problems in one type of query which is quite common in our application. We have created a (simplified) reproducible test case which (hopefully!) creates all necessary tables and data to show the problem. Plain-text description of the data model in the test case: We have a set of objects (like electrical cables), each having two nodes in the table connections (think of these two rows together as an edge in a graph). Another table connections_locked contains rows for some of the same objects, which are locked by a long transaction. The view connections_v performs a union all of the rows from connections which are not modified in the current long transaction with the rows from connections_locked which are modified in the current long transaction. Goal: Given an object id, we want to find all neighbors for this object (that is, objects which share a node with this object). Problem: We think that our query used to find neighbors would benefit greatly from using some of our indexes, but we fail to make it do so. Over to the actual test case: -- -- Tested on (from select version ()): -- PostgreSQL 9.0.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit -- PostgreSQL 9.1beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit -- Ubuntu 11.04, uname -a output: -- Linux hostname 2.6.38-8-generic-pae #42-Ubuntu SMP Mon Apr 11 05:17:09 UTC 2011 i686 i686 i386 GNU/Linux -- Processor: Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz -- Drive: Intel X25-M SSD drop table if exists connections cascade; drop table if exists connections_locked cascade; create table connections ( con_id serial primary key, locked_by integer not null, obj_id integer not null, node integer not null ); -- create test nodes, two per obj_id insert into connections (locked_by, obj_id, node) select 0, n/2, 1000 + (n + 1)/2 from generate_series (1,50) as n; create index connections_node_idx on connections (node); create index connections_obj_idx on connections (obj_id); vacuum analyze connections; create table connections_locked ( con_id integer not null, locked_by integer not null, obj_id integer not null, node integer not null, constraint locked_pk primary key (con_id, locked_by) ); -- mark a few of the objects as locked by a long transaction insert into connections_locked (con_id, locked_by, obj_id, node) select n, 1 + n/50, n/2, 1000 + (n + 1)/2 from generate_series (1,25000) as n; create index connections_locked_node_idx on connections_locked (node); create index connections_locked_obj_idx on connections_locked (obj_id); vacuum analyze connections_locked; -- Create a view showing the world as seen by long transaction 4711. -- In real life, this uses a session variable instead of a hard-coded value. create or replace view connections_v as select * from connections where locked_by 4711 union all select * from connections_locked where locked_by = 4711; -- This is the query we are trying to optimize. -- We expect this to be able to use our indexes, but instead get sequential scans explain analyze select con2.obj_id from connections_v con1, connections_v con2 where con1.obj_id = 17 and con2.node = con1.node ; -- Output: -- Hash Join (cost=16.69..16368.89 rows=7501 width=4) (actual time=0.096..778.830 rows=4 loops=1) -- Hash Cond: (*SELECT* 1.node = *SELECT* 1.node) -- - Append (cost=0.00..14402.00 rows=500050 width=8) (actual time=0.011..640.163 rows=50 loops=1) -- - Subquery Scan on *SELECT* 1 (cost=0.00..13953.00 rows=50 width=8) (actual time=0.011..430.645 rows=50 loops=1) -- - Seq Scan on connections (cost=0.00..8953.00 rows=50 width=16) (actual time=0.009..178.535 rows=50 loops=1) --
Re: [PERFORM] How to avoid seq scans for joins between union-all views (test case included)
2011/5/13 Denis de Bernardy ddeberna...@yahoo.com: I might have misread, but: select * from connections where locked_by 4711 union all select * from connections_locked where locked_by = 4711; The first part will result in a seq scan irrespective of indexes, and the second has no index on locked_by. The best you can do is to eliminate the seq scan on the second by adding the missing index on locked_by. just rework the primary key to set the locked_id first should work. That said, note that index usage depends on your data distribution: postgres may identify that it'll read most/all of the table anyway, and opt to do a (cheaper) seq scan instead. Fredrick, What indexes Oracle did choose ? (index-only scan ?) D - Original Message - From: Fredrik Widlert fredrik.widl...@digpro.se To: pgsql-performance@postgresql.org Cc: Sent: Friday, May 13, 2011 1:55 PM Subject: [PERFORM] How to avoid seq scans for joins between union-all views (test case included) Hi everyone We have recently started to port an application from Oracle to PostgreSQL. So far, we are amazed with how great most things work. However, we have run into performance problems in one type of query which is quite common in our application. We have created a (simplified) reproducible test case which (hopefully!) creates all necessary tables and data to show the problem. Plain-text description of the data model in the test case: We have a set of objects (like electrical cables), each having two nodes in the table connections (think of these two rows together as an edge in a graph). Another table connections_locked contains rows for some of the same objects, which are locked by a long transaction. The view connections_v performs a union all of the rows from connections which are not modified in the current long transaction with the rows from connections_locked which are modified in the current long transaction. Goal: Given an object id, we want to find all neighbors for this object (that is, objects which share a node with this object). Problem: We think that our query used to find neighbors would benefit greatly from using some of our indexes, but we fail to make it do so. Over to the actual test case: -- -- Tested on (from select version ()): -- PostgreSQL 9.0.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit -- PostgreSQL 9.1beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit -- Ubuntu 11.04, uname -a output: -- Linux hostname 2.6.38-8-generic-pae #42-Ubuntu SMP Mon Apr 11 05:17:09 UTC 2011 i686 i686 i386 GNU/Linux -- Processor: Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz -- Drive: Intel X25-M SSD drop table if exists connections cascade; drop table if exists connections_locked cascade; create table connections ( con_id serial primary key, locked_by integer not null, obj_id integer not null, node integer not null ); -- create test nodes, two per obj_id insert into connections (locked_by, obj_id, node) select 0, n/2, 1000 + (n + 1)/2 from generate_series (1,50) as n; create index connections_node_idx on connections (node); create index connections_obj_idx on connections (obj_id); vacuum analyze connections; create table connections_locked ( con_id integer not null, locked_by integer not null, obj_id integer not null, node integer not null, constraint locked_pk primary key (con_id, locked_by) ); -- mark a few of the objects as locked by a long transaction insert into connections_locked (con_id, locked_by, obj_id, node) select n, 1 + n/50, n/2, 1000 + (n + 1)/2 from generate_series (1,25000) as n; create index connections_locked_node_idx on connections_locked (node); create index connections_locked_obj_idx on connections_locked (obj_id); vacuum analyze connections_locked; -- Create a view showing the world as seen by long transaction 4711. -- In real life, this uses a session variable instead of a hard-coded value. create or replace view connections_v as select * from connections where locked_by 4711 union all select * from connections_locked where locked_by = 4711; -- This is the query we are trying to optimize. -- We expect this to be able to use our indexes, but instead get sequential scans explain analyze select con2.obj_id from connections_v con1, connections_v con2 where con1.obj_id = 17 and con2.node = con1.node ; -- Output: -- Hash Join (cost=16.69..16368.89 rows=7501 width=4) (actual time=0.096..778.830 rows=4 loops=1) -- Hash Cond: (*SELECT* 1.node = *SELECT* 1.node) -- - Append (cost=0.00..14402.00 rows=500050 width=8) (actual time=0.011..640.163 rows=50 loops=1) -- - Subquery Scan on *SELECT* 1 (cost=0.00..13953.00 rows=50 width=8) (actual time=0.011..430.645 rows=50
Re: [PERFORM] How to avoid seq scans for joins between union-all views (test case included)
Hi Denis and Cédric Thanks for your answers. Fredrick, What indexes Oracle did choose ? (index-only scan ?) Oracle chooses a plan which looks like this: SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=7 Bytes=182) VIEW OF 'CONNECTIONS_V' (VIEW) (Cost=5 Card=7 Bytes=182) UNION-ALL INLIST ITERATOR TABLE ACCESS (BY INDEX ROWID) OF 'CONNECTIONS' (TABLE) (Cost=5 Card=6 Bytes=54) INDEX (RANGE SCAN) OF 'CONNECTIONS_NODE_IDX' (INDEX) (Cost=4 Card=6) INLIST ITERATOR TABLE ACCESS (BY INDEX ROWID) OF 'CONNECTIONS_LOCKED' (TABLE) (Cost=0 Card=1 Bytes=39) INDEX (RANGE SCAN) OF 'CONNECTIONS_LOCKED_NODE_IDX' (INDEX) (Cost=0 Card=1) This means that only the indexes of connections.node and connections_locked.node are used. I don't think that we want to use any index for locked_by here, we are hoping for the node = value predicate to be pushed into both halves of the union all view (not sure if this is the right terminology). For example, in the simplified-but-still-problematic query select con2.obj_id from connections_v con2 where con2.node in (select 1015); we are hoping for the node-index to be used for both connections and connections_locked. We hope to get the same plan/performance as for this query: select con2.obj_id from connections_v con2 where con2.node in (1015); I don't understand why there is a difference between in (select 1015) and in (1015)? That said, note that index usage depends on your data distribution: postgres may identify that it'll read most/all of the table anyway, and opt to do a (cheaper) seq scan instead. Yes, I know, but I've tried to create the test case data distribution in a way I hope makes this unlikely (0.5 million rows in one table, 25000 in the other table, two rows in each table for each distinct value of node, only a few rows returned from the queries. Thanks again for you answers so far /Fredrik -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Link error when use Pgtypes function in windows
Hi: I installed PostgreSQL9.0 from EnterpriseDB with“one click installer” in windows 7 32bit. and use microsoft visual studio 2010 c++. I added the libpq.lib to the link property of the project, also included the lib folder and path. Successfully compiled .c and .cpp file after transfer .pgc file to .c file using ECPG. But it always have errors like this when link: error LNK2019: unresolved external symbol _PGTYPESinterval_new referenced in function. And all the function PGTYPEStimestamp and PGTYPEsinterval can cause the same error. Does someone can help me? Thanks. Fanbin
Re: [PERFORM] Link error when use Pgtypes function in windows
Does someone can help me? You may want to try pgsql-general instead of this list. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Apr 26, 2011 at 4:37 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Sok Ann Yap sok...@gmail.com wrote: So, index scan wins by a very small margin over sequential scan after the tuning. I am a bit puzzled because index scan is more than 3000 times faster in this case, but the estimated costs are about the same. Did I do something wrong? Tuning is generally needed to get best performance from PostgreSQL. Needing to reduce random_page_cost is not unusual in situations where a good portion of the active data is in cache (between shared_buffers and the OS cache). Please show us your overall configuration and give a description of the hardware (how many of what kind of cores, how much RAM, what sort of storage system). The configuration part can be obtained by running the query on this page and pasting the result into your next post: http://wiki.postgresql.org/wiki/Server_Configuration There are probably some other configuration adjustments you could do to ensure that good plans are chosen. The very first thing to check is effective_cache_size and to set it to a reasonable value. Actually, effective_cache_size has no impact on costing except when planning a nested loop with inner index scan. So, a query against a single table can never benefit from changing that setting. Kevin's suggestion of adjusting seq_page_cost and random_page_cost is the way to go. We've talked in the past (and I still think it's a good idea, but haven't gotten around to doing anything about it) about adjusting the planner to attribute to each relation the percentage of its pages which we believe we'll find in cache. Although many complicated ideas for determining that percentage have been proposed, my favorite one is fairly simple: assume that small relations will be mostly or entirely cached, and that big ones won't be. Allow the administrator to override the result on a per-relation basis. It's difficult to imagine a situation where the planner should assume that a relation with only handful of pages isn't going to be cached. Even if it isn't, as soon as someone begins accessing it, it will be. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Robert Haas robertmh...@gmail.com wrote: We've talked in the past (and I still think it's a good idea, but haven't gotten around to doing anything about it) about adjusting the planner to attribute to each relation the percentage of its pages which we believe we'll find in cache. Although many complicated ideas for determining that percentage have been proposed, my favorite one is fairly simple: assume that small relations will be mostly or entirely cached, and that big ones won't be. Allow the administrator to override the result on a per-relation basis. It's difficult to imagine a situation where the planner should assume that a relation with only handful of pages isn't going to be cached. Even if it isn't, as soon as someone begins accessing it, it will be. Simple as the heuristic is, I bet it would be effective. While one can easily construct a synthetic case where it falls down, the ones I can think of aren't all that common, and you are suggesting an override mechanism. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move
On Wed, May 4, 2011 at 6:31 AM, Willy-Bas Loos willy...@gmail.com wrote: I'm asking them for (real) benchmarks, thanks for the advice. (fio is not available for us now to do it myself, grmbl) It just occurred to me that it is not necessarily the case that reading the indexes causes a lot of random I/O (on the index itself). I mean, maybe the index is generally read sequentially and then, when retrieving the data, there is a lot of random I/O. if it's a long story, any tips for info about this (book or web site)? If you don't do anything special, and if the query plan says Index Scan rather than Bitmap Index Scan, then both the index I/O and the table I/O are likely to be fairly random. However there are a number of cases in which you can expect the table I/O to be sequential: - In some cases, you may happen to insert rows with an ordering that matches the index. For example, if you have a table with not too many updates and deletes, and an index on a serial column, then new rows will have a higher value in that column than old rows, and will also typically be physically after older rows in the file. Or you might be inserting timestamped data from oldest to newest. - If the planner chooses a Bitmap Index Scan, it effectively scans the index to figure out which table blocks to read, and then reads those table blocks in block number order, so that the I/O is sequential, with skips. - If you CLUSTER the table on a particular index, it will be physically ordered to match the index's key ordering. As the table is further modified the degree of clustering will gradually decline; eventually you may wish to re-CLUSTER. It's also worth keeping in mind that the index itself won't necessarily be accessed in physically sequential order. The point of the index is to emit the rows in key order, but if the table is heavily updated, it won't necessarily be the case that a page containing lower-valued keys physically precedes a page containing higher-valued keys. I'm actually somewhat fuzzy on how this works, and to what extent it's a problem in practice, but I am fairly sure it can happen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Robert Haas robertmh...@gmail.com writes: On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote: The very first thing to check is effective_cache_size and to set it to a reasonable value. Actually, effective_cache_size has no impact on costing except when planning a nested loop with inner index scan. So, a query against a single table can never benefit from changing that setting. That's flat out wrong. It does affect the cost estimate for plain indexscan (and bitmap indexscan) plans. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Fri, May 13, 2011 at 3:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote: The very first thing to check is effective_cache_size and to set it to a reasonable value. Actually, effective_cache_size has no impact on costing except when planning a nested loop with inner index scan. So, a query against a single table can never benefit from changing that setting. That's flat out wrong. It does affect the cost estimate for plain indexscan (and bitmap indexscan) plans. rereads code OK, I agree. I obviously misinterpreted this code the last time I read it. I guess maybe the reason why it didn't matter for the OP is that - if the size of the index page in pages is smaller than the pro-rated fraction of effective_cache_size allowed to the index - then the exact value doesn't affect the answer. I apparently need to study this code more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
I guess maybe the reason why it didn't matter for the OP is that - if the size of the index page in pages is smaller than the pro-rated fraction of effective_cache_size allowed to the index - then the exact value doesn't affect the answer. I apparently need to study this code more. FWIW: random_page_cost is meant to be the ratio between the cost of looking up a single row as and index lookup, and the cost of looking up that same row as part of a larger sequential scan. For specific storage, that coefficient should be roughly the same regardless of the table size. So if your plan for optimization involves manipulating RPC for anything other than a change of storage, you're Doing It Wrong. Instead, we should be fixing the formulas these are based on and leaving RPC alone. For any data page, there are actually four costs associated with each tuple lookup, per: in-memory/seq | on disk/seq + in-memory/random| on disk/random (yes, there's actually more for bitmapscan etc. but the example holds) For any given tuple lookup, then, you can assign a cost based on where you think that tuple falls in that quadrant map. Since this is all probability-based, you'd be assigning a cost as a mixed % of in-memory and on-disk costs. Improvements in accuracy of this formula would come through improvements in accuracy in predicting if a particular data page will be in memory. This is what the combination of random_page_cost and effective_cache_size ought to supply, but I don't think it does, quite. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
2011/5/13 Josh Berkus j...@agliodbs.com: I guess maybe the reason why it didn't matter for the OP is that - if the size of the index page in pages is smaller than the pro-rated fraction of effective_cache_size allowed to the index - then the exact value doesn't affect the answer. I apparently need to study this code more. FWIW: random_page_cost is meant to be the ratio between the cost of looking up a single row as and index lookup, and the cost of looking up that same row as part of a larger sequential scan. For specific storage, that coefficient should be roughly the same regardless of the table size. So if your plan for optimization involves manipulating RPC for anything other than a change of storage, you're Doing It Wrong. Instead, we should be fixing the formulas these are based on and leaving RPC alone. For any data page, there are actually four costs associated with each tuple lookup, per: in-memory/seq | on disk/seq + in-memory/random| on disk/random it lacks some more theorical like sort_page/temp_page : those are based on a ratio of seq_page_cost and random_page_cost or a simple seq_page_cost (when working out of work_mem) memory access is accounted with some 0.1 in some place AFAIR. (and memory random/seq is the same at the level of estimations we do) (yes, there's actually more for bitmapscan etc. but the example holds) (if I read correctly the sources, for this one there is a linear approach to ponderate the cost between random_page cost and seq_page_cost on the heap page fetch plus the Mackert and Lohman formula, if needed, in its best usage : predicting what should be in cache *because* of the current query execution, not because of the current status of the page cache) For any given tuple lookup, then, you can assign a cost based on where you think that tuple falls in that quadrant map. Since this is all probability-based, you'd be assigning a cost as a mixed % of in-memory and on-disk costs. Improvements in accuracy of this formula would come through improvements in accuracy in predicting if a particular data page will be in memory. This is what the combination of random_page_cost and effective_cache_size ought to supply, but I don't think it does, quite. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Using pgiosim realistically
Hi all: I am adding pgiosim to our testing for new database hardware and I am seeing something I don't quite get and I think it's because I am using pgiosim incorrectly. Specs: OS: centos 5.5 kernel: 2.6.18-194.32.1.el5 memory: 96GB cpu: 2x Intel(R) Xeon(R) X5690 @ 3.47GHz (6 core, ht enabled) disks: WD2003FYYS RE4 raid: lsi - 9260-4i with 8 disks in raid 10 configuration 1MB stripe size raid cache enabled w/ bbu disk caches disabled filesystem: ext3 created with -E stride=256 I am seeing really poor (70) iops with pgiosim. According to: http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the database benchmark they are seeing ~170 iops on a single disk for these drives. I would expect an 8 disk raid 10 should get better then 3x the single disk rate (assuming the data is randomly distributed). To test I am using 5 100GB files with sudo ~/pgiosim -c -b 100G -v file? I am using 100G sizes to make sure that the data read and files sizes exceed the memory size of the system. However if I use 5 1GB files (and still 100GB read data) I see 200+ to 400+ iops at 50% of the 100GB of data read, which I assume means that the data is cached in the OS cache and I am not really getting hard drive/raid I/O measurement of iops. However, IIUC postgres will never have an index file greater than 1GB in size (http://www.postgresql.org/docs/8.4/static/storage-file-layout.html) and will just add 1GB segments, so the 1GB size files seems to be more realistic. So do I want 100 (or probably 2 or 3 times more say 300) 1GB files to feed pgiosim? That way I will have enough data that not all of it can be cached in memory and the file sizes (and file operations: open/close) more closely match what postgres is doing with index files? Also in the output of pgiosim I see: 25.17%, 2881 read, 0 written, 2304.56kB/sec 288.07 iops which I interpret (left to right) as the % of the 100GB that has been read, the number of read operations over some time period, number of bytes read/written and the io operations/sec. Iops always seems to be 1/10th of the read number (rounded up to an integer). Is this expected and if so anybody know why? While this is running if I also run iostat -p /dev/sdc 5 I see: Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdc 166.40 2652.80 4.80 13264 24 sdc1 2818.80 1.20 999.20 6 4996 which I am interpreting as 2818 read/io operations (corresponding more or less to read in the pgiosim output) to the partition and of those only 116 are actually going to the drive??? with the rest handled from OS cache. However the tps isn't increasing when I see pgiosim reporting: 48.47%, 4610 read, 0 written, 3687.62kB/sec 460.95 iops an iostat 5 output near the same time is reporting: Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdc 165.87 2647.50 4.79 13264 24 sdc1 2812.97 0.60 995.41 3 4987 so I am not sure if there is a correlation between the read and tps settings. Also I am assuming blks written is filesystem metadata although that seems like a lot of data If I stop the pgiosim, the iostat drops to 0 write and reads as expected. So does anybody have any comments on how to test with pgiosim and how to correlate the iostat and pgiosim outputs? Thanks for your feedback. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Fri, May 13, 2011 at 4:13 PM, Josh Berkus j...@agliodbs.com wrote: Instead, we should be fixing the formulas these are based on and leaving RPC alone. For any data page, there are actually four costs associated with each tuple lookup, per: All true. I suspect that in practice the different between random and sequential memory page costs is small enough to be ignorable, although of course I might be wrong. I've never seen a database that was fully cached in memory where it was necessary to set random_page_costseq_page_cost to get good plans -- no doubt partly because even if the pages were consecutive on disk, there's no reason to suppose they would be so in memory, and we certainly wouldn't know one way or the other at planning time. But I agree we should add a cached_page_cost as part of all this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query improvement
On Mon, May 2, 2011 at 3:58 AM, Claudio Freire klaussfre...@gmail.com wrote: Hash joins are very inefficient if they require big temporary files. Hmm, that's not been my experience. What have you seen? I've seen a 64-batch hash join beat out a nested-loop-with-inner-indexscan, which I never woulda believed, but... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance