[PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) I'm running the join shown below and it takes > 10 hours and eventually runs out of disk space on a 1.4TB file system I've included below a describe of both tables, the join and an explain plan, any help / suggestions would be much appreciated ! I need to get this beast to run as quickly as possible (without filling up my file system) Thanks in advance... select f14.xpublisher_dim_id, f14.xtime_dim_id, f14.xlocation_dim_id, f14.xreferrer_dim_id, f14.xsite_dim_id, f14.xsystem_cfg_dim_id, f14.xaffiliate_dim_id, f14.customer_id, pf_dts_id, episode_id, sessionid, bytes_received, bytes_transmitted, total_played_time_sec, segdim.xsegment_dim_id as episode_level_segid from bigtab_stats_fact_tmp14 f14, xsegment_dim segdim where f14.customer_id = segdim.customer_srcid and f14.show_id = segdim.show_srcid and f14.season_id = segdim.season_srcid and f14.episode_id = segdim.episode_srcid and segdim.segment_srcid is NULL; QUERY PLAN --- Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid = f14.show_id) AND (segdim.season_srcid = f14.season_id)) -> Sort (cost=1570.35..1579.46 rows=3643 width=40) Sort Key: segdim.episode_srcid, segdim.customer_srcid, segdim.show_srcid, segdim.season_srcid -> Seq Scan on xsegment_dim segdim (cost=0.00..1354.85 rows=3643 width=40) Filter: (segment_srcid IS NULL) -> Sort (cost=1755323.26..1780227.95 rows=9961874 width=126) Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id -> Seq Scan on bigtab_stats_fact_tmp14 f14 (cost=0.00..597355.74 rows=9961874 width=126) (9 rows) # \d bigtab_stats_fact_tmp14 Table "public.bigtab_stats_fact_tmp14" Column |Type | Modifiers --+-+--- pf_dts_id | bigint | pf_device_id | bigint | segment_id | bigint | cdn_id | bigint | collector_id | bigint | digital_envoy_id | bigint | maxmind_id | bigint | quova_id | bigint | website_id | bigint | referrer_id | bigint | affiliate_id | bigint | custom_info_id | bigint | start_dt | timestamp without time zone | total_played_time_sec| numeric(18,5) | bytes_received | bigint | bytes_transmitted| bigint | stall_count | integer | stall_duration_sec | numeric(18,5) | hiccup_count | integer | hiccup_duration_sec | numeric(18,5) | watched_duration_sec | numeric(18,5) | rewatched_duration_sec | numeric(18,5) | requested_start_position | numeric(18,5) | requested_stop_position | numeric(18,5) | post_position| numeric(18,5) | is_vod | numeric(1,0)| sessionid| bigint | create_dt| timestamp without time zone | segment_type_id | bigint | customer_id | bigint | content_publisher_id | bigint | content_owner_id | bigint | episode_id | bigint | duration_sec | numeric(18,5) | device_id| bigint | os_id| bigint | browser_id | bigint | cpu_id | bigint | xsystem_cfg_dim_id | bigint | xreferrer_dim_id| bigint | xaffiliate_dim_id | bigint | xsite_dim_id| bigint | xpublisher_dim_id | bigint | season_id| bigint | show_id | bigint | xsegment_dim_id | bigint | location_id | bigint | zipcode | character varying(20) | xlocation_dim
Re: [PERFORM] I/O on select count(*)
>>> On Thu, May 15, 2008 at 5:11 PM, in message <[EMAIL PROTECTED]>, James Mansion <[EMAIL PROTECTED]> wrote: > Alvaro Herrera wrote: >> Hint bits are used to mark tuples as created and/or deleted by >> transactions that are know committed or aborted. To determine the >> visibility of a tuple without such bits set, you need to consult pg_clog >> and possibly pg_subtrans, so it is an expensive check. On the other >> > So, how come there is this outstanding work to do, which will inevitably > be done, and it > hasn't been done until it is 'just too late' to avoid getting in the way > of the query? There has been discussion from time to time about setting the hint bits for tuple inserts which occur within the same database transaction as the creation of the table into which they're being inserted. That would allow people to cover many of the bulk load situations. I don't see it on the task list. (I would also argue that there is little information lost, even from a forensic perspective, to writing such rows as "frozen".) Is this idea done, dead, or is someone working on it? If we could set hint bits on dirty buffer pages after the commit, we'd cover the OLTP situation. In many situations, there is a bigger OS cache than PostgreSQL shared memory, and an attempt to set the bits soon after the commit would coalesce the two writes into one physical write using RAM-based access, which would be almost as good. I don't know if it's feasible to try to do that after the pages have moved from the PostgreSQL cache to the OS cache, but it would likely be a performance win. If we are going to burden any requester process with the job of setting the hint bits, it would typically be better to burden the one doing the data modification rather than some random thread later trying to read data from the table. Of course, getting work off the requester processes onto some background worker process is generally even better. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > BTW we¹ve removed HINT bit checking in Greenplum DB and improved the > visibility caching which was enough to provide performance at the same level > as with the HINT bit optimization, but avoids this whole ³write the data, > write it to the log also, then write it again just for good measure² > behavior. > > For people doing data warehousing work like the poster, this Postgres > behavior is miserable. It should be fixed for 8.4 for sure (volunteers?) For people doing data warehousing I would think the trick would be to do something like what we do to avoid WAL logging for tables created in the same transaction. That is, if you're loading a lot of data at the same time then all of that data is going to be aborted or committed and that will happen at the same time. Ideally we would find a way to insert the data with the hint bits already set to committed and mark the section of the table as being only provisionally extended so other transactions wouldn't even look at those pages until the transaction commits. This is similar to the abortive attempt to have the abovementioned WAL logging trick insert the records pre-frozen. I recall there were problems with that idea though but I don't recall if they were insurmountable or just required more work. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Alvaro Herrera wrote: Hint bits are used to mark tuples as created and/or deleted by transactions that are know committed or aborted. To determine the visibility of a tuple without such bits set, you need to consult pg_clog and possibly pg_subtrans, so it is an expensive check. On the other So, how come there is this outstanding work to do, which will inevitably be done, and it hasn't been done until it is 'just too late' to avoid getting in the way of the query? The OP didn't suggest that he had just loaded the data. Also - is it the case that this only affects the case where updated pages were spilled during the transaction that changed them? ie, if we commit a transaction and there are changed rows still in the cache since their pages are not evicted yet, are the hint bits set immediately so that page is written just once? Seems this would be common in most OLTP systems. Heikki points out that the list might get big and need to be abandoned, but then you fall back to scheduling a clog scan that can apply the bits, which does what you have now, though hopefully in a way that fills slack disk IO rather than waiting for the read. Matthew says: 'it would be a list of changes since the last checkpoint' but I don't see why you can't start writing hints to in-memory pages as soon as the transaction ends. You might fall behind, but I doubt it with modern CPU speeds. I can't see why Pavan's suggestion to try to update as many of the bits as possible when a dirty page is evicted would be contentious. I do think this is something of interest to users, not just developers, since it may influence the way updates are processed where it is reasonable to do so in 'bite sized chunks' as a multipart workflow. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Tom Lane wrote: Hmm, the problem would be trying to figure out what percentage of writes could be blamed solely on hint-bit updates and not any other change to the page. I don't think that the bufmgr currently keeps enough state to know that, but you could probably modify it easily enough, since callers distinguish MarkBufferDirty from SetBufferCommitInfoNeedsSave. Define another flag bit that's set only by the first, and test it during write-out. Ok, I made a few changes to bufmgr per my understanding of your description above and with my limited understanding of the code. Patch is attached. Assuming the patch is correct, the effect of writes due to hint bits is quite significant. I collected the data below by runing pgbench in one terminal and psql on another to run the query. Is the data plausible? -Robert -- Backend PID: 16189 SQL Statement : select count(*) from accounts; Execution time : 17.33 sec Buffer Read Counts Tablespace Database Table Count 1663 16384 2600 1 1663 16384 2601 1 1663 16384 2615 1 1663 16384 1255 2 1663 16384 2602 2 1663 16384 2603 2 1663 16384 2616 2 1663 16384 2650 2 1663 16384 2678 2 1663 16384 1247 3 1663 16384 1249 3 1663 16384 2610 3 1663 16384 2655 3 1663 16384 2679 3 1663 16384 2684 3 1663 16384 2687 3 1663 16384 2690 3 1663 16384 2691 3 1663 16384 2703 4 1663 16384 1259 5 1663 16384 2653 5 1663 16384 2662 5 1663 16384 2663 5 1663 16384 2659 7 1663 16384 16397 8390 Dirty Buffer Write Counts = Tablespace Database Table Count 1663 16384 16402 2 1663 16384 16394 11 1663 16384 16397 4771 == Hint Bits Write Counts == Tablespace Database Table Count 1663 16384 16397 4508 Total buffer cache hits : 732 Total buffer cache misses: 7731 Average read time from cache : 9136 (ns) Average read time from disk :384201 (ns) Average write time to disk :210709 (ns) Backend PID: 16189 SQL Statement : select count(*) from accounts; Execution time : 12.72 sec Buffer Read Counts Tablespace Database Table Count 1663 16384 16397 8392 Dirty Buffer Write Counts = Tablespace Database Table Count 1663 16384 16394 6 1663 16384 16402 7 1663 16384 16397 2870 == Hint Bits Write Counts == Tablespace Database Table Count 1663 16384 16402 2 1663 16384 16397 2010 Total buffer cache hits : 606 Total buffer cache misses: 7786 Average read time from cache : 6949 (ns) Average read time from disk :706288 (ns) Average write time to disk : 90426 (ns) Index: bufmgr.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.228 diff -u -3 -p -r1.228 bufmgr.c --- bufmgr.c 1 Jan 2008 19:45:51 - 1.228 +++ bufmgr.c 15 May 2008 20:56:38 - @@ -42,6 +42,7 @@ #include "storage/smgr.h" #include "utils/resowner.h" #include "pgstat.h" +#include "pg_trace.h" /* Note: these two macros only work on shared buffers, not local ones! */ @@ -171,6 +172,7 @@ ReadBuffer_common(Relation reln, BlockNu if (isExtend) blockNum = smgrnblocks(reln->rd_smgr); + TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, reln->rd_node.spcNode, reln->rd_node.dbNode, reln->rd_node.relNode, isLocalBuf); pgstat_count_buffer_read(reln); if (isLocalBuf) @@ -200,12 +202,16 @@ ReadBuffer_common(Relation reln, BlockNu { if (!isExtend) { + TRACE_POSTGRESQL_BUFFER_HIT(); /* Just need to update stats before we exit */ pgstat_count_buffer_hit(reln); if (VacuumCostActive) VacuumCostBalance += VacuumCostPageHit; + TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, + reln->rd_node.spcNode, reln->rd_node.dbNode, + reln->rd_node.relNode, isLocalBuf, found); return BufferDescriptorGetBuffer(bufHdr); } @@ -257,6 +263,7 @@ ReadBuffer_common(Relation reln, BlockNu } while (!StartBufferIO(bufHdr, true)); } } + TRACE_POSTGRESQL_BUFFER_MISS(); /* * if we have go
Re: [PERFORM] Update performance degrades over time
Any system catalog views I can check for wait events causing slower response times. Thanks in advance. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Subbiah Stalin Sent: Thursday, May 15, 2008 9:28 AM To: Jeffrey Baker; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Update performance degrades over time Yes we are updating one of indexed timestamp columns which gets unique value on every update. We tried setting autovacuum_vacuum_scale_factor = 0.1 from default to make autovacuum bit aggressive, we see bloating on both table and it's indexes but it's creeping up slowly though. Anyways, even with slower bloating, I still see update performance to degrade with 15 sec response time captured by setting log_min_duration_stmt. Looks like bloating isn't causing slower updates. Any help/ideas to tune this is appreciated. Explain plan seems reasonable for the update statement. update tablexy set col2=$1,col9=$2, col10=$3,col11=$4,col3=$5 WHERE ID=$6; QUERY PLAN Index Scan using ct_tablexy_id_u1 on tablexy (cost=0.00..8.51 rows=1 width=194) (actual time=0.162..0.166 rows=1 loops=1) Index Cond: ((id)::text = '32xka8axki8'::text) Thanks in advance. Stalin -Original Message- From: Jeffrey Baker [mailto:[EMAIL PROTECTED] Sent: Thursday, May 15, 2008 6:56 AM To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Update performance degrades over time On Wed, May 14, 2008 at 6:31 PM, Subbiah Stalin-XCGF84 <[EMAIL PROTECTED]> wrote: > Hi All, > > We are doing some load tests with our application running postgres > 8.2.4. At times we see updates on a table taking longer (around > 11-16secs) than expected sub-second response time. The table in > question is getting updated constantly through the load tests. In > checking the table size including indexes, they seem to be bloated got > it confirmed after recreating it (stats below). We have autovacuum > enabled with default parameters. I thought autovaccum would avoid > bloating issues but looks like its not aggressive enough. Wondering if > table/index bloating is causing update slowness in over a period of > time. Any ideas how to troubleshoot this further. Sometimes it is necessary to not only VACUUM, but also REINDEX. If your update changes an indexed column to a new, distinct value, you can easily get index bloat. Also, you should check to see if you have any old, open transactions on the same instance. If you do, it's possible that VACUUM will have no beneficial effect. -jwb -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Robert Lor <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> It's certainly true that hint-bit updates cost something, but >> quantifying how much isn't easy. > Maybe we can instrument the code with DTrace probes to quantify the > actual costs. Hmm, the problem would be trying to figure out what percentage of writes could be blamed solely on hint-bit updates and not any other change to the page. I don't think that the bufmgr currently keeps enough state to know that, but you could probably modify it easily enough, since callers distinguish MarkBufferDirty from SetBufferCommitInfoNeedsSave. Define another flag bit that's set only by the first, and test it during write-out. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Tom Lane wrote: It's certainly true that hint-bit updates cost something, but quantifying how much isn't easy. Maybe we can instrument the code with DTrace probes to quantify the actual costs. I'm not familiar with the code, but if I know where to place the probes, I can easily do a quick test and provide the data. The off-the-cuff answer is to do the select count(*) twice and see how much cheaper the second one is. Doesn't seem the second run is cheaper as shown in the results below. The data came from the probes I've added recently. *** Run #1 ** SQL Statement : select count(*) from accounts; Execution time : 1086.58 (ms) Buffer Read Counts Tablespace Database Table Count 1663 16384 1247 1 1663 16384 2600 1 1663 16384 2703 1 1663 16384 1255 2 1663 16384 2650 2 1663 16384 2690 3 1663 16384 2691 3 1663 16384 16397 8390 Dirty Buffer Write Counts = Tablespace Database Table Count 1663 16384 16397 2865 Total buffer cache hits : 1932 Total buffer cache misses: 6471 Average read time from cache : 5638 (ns) Average read time from disk :143371 (ns) Average write time to disk : 20368 (ns) *** Run #2 ** SQL Statement : select count(*) from accounts; Execution time : 1115.94 (ms) Buffer Read Counts Tablespace Database Table Count 1663 16384 16397 8390 Dirty Buffer Write Counts = Tablespace Database Table Count 1663 16384 16397 2865 Total buffer cache hits : 1931 Total buffer cache misses: 6459 Average read time from cache : 4357 (ns) Average read time from disk :154127 (ns) Average write time to disk : 20368 (ns) -Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Update performance degrades over time
Yes we are updating one of indexed timestamp columns which gets unique value on every update. We tried setting autovacuum_vacuum_scale_factor = 0.1 from default to make autovacuum bit aggressive, we see bloating on both table and it's indexes but it's creeping up slowly though. Anyways, even with slower bloating, I still see update performance to degrade with 15 sec response time captured by setting log_min_duration_stmt. Looks like bloating isn't causing slower updates. Any help/ideas to tune this is appreciated. Explain plan seems reasonable for the update statement. update tablexy set col2=$1,col9=$2, col10=$3,col11=$4,col3=$5 WHERE ID=$6; QUERY PLAN Index Scan using ct_tablexy_id_u1 on tablexy (cost=0.00..8.51 rows=1 width=194) (actual time=0.162..0.166 rows=1 loops=1) Index Cond: ((id)::text = '32xka8axki8'::text) Thanks in advance. Stalin -Original Message- From: Jeffrey Baker [mailto:[EMAIL PROTECTED] Sent: Thursday, May 15, 2008 6:56 AM To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Update performance degrades over time On Wed, May 14, 2008 at 6:31 PM, Subbiah Stalin-XCGF84 <[EMAIL PROTECTED]> wrote: > Hi All, > > We are doing some load tests with our application running postgres > 8.2.4. At times we see updates on a table taking longer (around > 11-16secs) than expected sub-second response time. The table in > question is getting updated constantly through the load tests. In > checking the table size including indexes, they seem to be bloated got > it confirmed after recreating it (stats below). We have autovacuum > enabled with default parameters. I thought autovaccum would avoid > bloating issues but looks like its not aggressive enough. Wondering if > table/index bloating is causing update slowness in over a period of > time. Any ideas how to troubleshoot this further. Sometimes it is necessary to not only VACUUM, but also REINDEX. If your update changes an indexed column to a new, distinct value, you can easily get index bloat. Also, you should check to see if you have any old, open transactions on the same instance. If you do, it's possible that VACUUM will have no beneficial effect. -jwb -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Heikki Linnakangas escribió: >> We know what kind of a relation we're dealing with in ReadBuffer, so we >> could add a flag to BufferDesc to mark heap pages. > Hmm, I was thinking that it would need access to the catalogs to know > where the tuples are, but that's certainly not true, so perhaps it could > be made to work. The issue in my mind is not so much could bgwriter physically do it as that it's a violation of module layering. That has real consequences, like potential for deadlocks. It'll become particularly pressing if we go forward with the plans to get rid of the separate dedicated buffers for pg_clog etc and have them work in the main shared-buffer pool. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
"Joshua D. Drake" writes: > Guillaume Cottenceau wrote: >> Matthew Wakeling writes: > >> It is still relevant, as with 5% margin, you can afford changing >> that to 0% with tune2fs, just the time for you to start PG and >> remove some data by SQL, then shutdown and set the margin to 5% >> again. > > I find that if you actually reach that level of capacity failure it is > due to lack of management and likely there is much lower hanging fruit > left over by a lazy dba or sysadmin than having to adjust filesystem > level parameters. > > Manage actively and the above change is absolutely irrelevant. Of course. I didn't say otherwise. I only say that it's useful in that case. E.g. if you're using a dedicated partition for PG, then a good solution is what I describe, rather than horrifyingly trying to remove some random PG files, or when you cannot temporarily move some of them and symlink from the PG partition. I don't praise that kind of case, it should of course be avoided by sane management. A bad management is not a reason for hiding solutions to the problems that can happen! -- Guillaume Cottenceau -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Matthew Wakeling wrote: Aside from the rest of commentary, a slight clarification: > So, as I understand it, Postgres works like this: > > 1. You begin a transaction. Postgres writes an entry into pg_clog. Starting a transaction does not write anything to pg_clog. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Heikki Linnakangas escribió: > Alvaro Herrera wrote: >> The problem is that the bgwriter does not understand about the content >> of the pages it is writing -- they're opaque pages for all it knows. So >> it cannot touch the hint bits. > > We know what kind of a relation we're dealing with in ReadBuffer, so we > could add a flag to BufferDesc to mark heap pages. Hmm, I was thinking that it would need access to the catalogs to know where the tuples are, but that's certainly not true, so perhaps it could be made to work. >> If we had the bitmask in a separate map fork, this could be cheap. > > I don't buy that. The point of a hint bit is that it's right there along > with the tuple you're looking at. If you have to look at a separate > buffer, you might as well just look at clog. True -- I was confusing this with the idea of having the tuple MVCC header (xmin, xmax, etc) in a separate fork, which would make the idea of index-only scans more feasible at the expense of seqscans. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
Guillaume Cottenceau wrote: Matthew Wakeling writes: It is still relevant, as with 5% margin, you can afford changing that to 0% with tune2fs, just the time for you to start PG and remove some data by SQL, then shutdown and set the margin to 5% again. I find that if you actually reach that level of capacity failure it is due to lack of management and likely there is much lower hanging fruit left over by a lazy dba or sysadmin than having to adjust filesystem level parameters. Manage actively and the above change is absolutely irrelevant. Joshua D. Drake -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
Matthew Wakeling writes: > On Thu, 15 May 2008, Guillaume Cottenceau wrote: >> Also, IIRC when PG writes data up to a full filesystem, >> postmaster won't be able to then restart if the filesystem is >> still full (it needs some free disk space for its startup). >> >> Or maybe this has been fixed in recent versions? > > Ah, the "not enough space to delete file, delete some files and try > again" problem. Anyway, that isn't relevant to the reserved > percentage, as that will happen whether or not the filesystem is 5% > smaller. It is still relevant, as with 5% margin, you can afford changing that to 0% with tune2fs, just the time for you to start PG and remove some data by SQL, then shutdown and set the margin to 5% again. -- Guillaume Cottenceau -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
On Thu, 15 May 2008, Guillaume Cottenceau wrote: Also, IIRC when PG writes data up to a full filesystem, postmaster won't be able to then restart if the filesystem is still full (it needs some free disk space for its startup). Or maybe this has been fixed in recent versions? Ah, the "not enough space to delete file, delete some files and try again" problem. Anyway, that isn't relevant to the reserved percentage, as that will happen whether or not the filesystem is 5% smaller. Matthew -- Let's say I go into a field and I hear "baa baa baa". Now, how do I work out whether that was "baa" followed by "baa baa", or if it was "baa baa" followed by "baa"? - Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
On Thu, 15 May 2008, Heikki Linnakangas wrote: There's not much point optimizing something that only helps with aborted transactions. That's fair enough, but this list method is likely to speed up index writes anyway. The general problem with any idea that involves keeping a list of changes made in a transaction is that that list will grow big during bulk loads, so you'll have to overflow to disk or abandon the list approach. Which means that it won't help with bulk loads. Yeah, it wouldn't be a list of changes for the transaction, it would be a list of changes since the last checkpoint. Keeping data in memory for the length of the transaction is doomed to failure, because there is no bound on its size, so bulk loads are still going to miss out on hint optimisation. Matthew -- for a in past present future; do for b in clients employers associates relatives neighbours pets; do echo "The opinions here in no way reflect the opinions of my $a $b." done; done -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Alvaro Herrera wrote: Greg Smith escribió: On Thu, 15 May 2008, Pavan Deolasee wrote: I had suggested in the past that whenever we set hint bits for a tuple, we should check all other tuples in the page and set their hint bits too to avoid multiple writes of the same page. I guess the idea got rejected because of lack of benchmarks to prove the benefit. From glancing at http://www.postgresql.org/docs/faqs.TODO.html I got the impression the idea was to have the background writer get involved to help with this particular situation. The problem is that the bgwriter does not understand about the content of the pages it is writing -- they're opaque pages for all it knows. So it cannot touch the hint bits. We know what kind of a relation we're dealing with in ReadBuffer, so we could add a flag to BufferDesc to mark heap pages. If we had the bitmask in a separate map fork, this could be cheap. I don't buy that. The point of a hint bit is that it's right there along with the tuple you're looking at. If you have to look at a separate buffer, you might as well just look at clog. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Matthew Wakeling wrote: On Thu, 15 May 2008, Heikki Linnakangas wrote: > Is it really safe to update the hint bits in place? If there is a > power cut in the middle of writing a block, is there a guarantee from > the disc that the block will never be garbled? Don't know, to be honest. We've never seen any reports of corrupted data that would suggest such a problem, but it doesn't seem impossible to me that some exotic storage system might do that. Hmm. That problem is what WAL full-page-writes is meant to handle, isn't it? So basically, if you're telling people that WAL full-page-writes is safer than partial WAL, because it avoids updating pages in-place, then you shouldn't be updating pages in-place for the hint bits either. You can't win! Full-page-writes protect from torn pages, that is, when one half of an update hits the disk but the other one doesn't. In particular, if the beginning of the page where the WAL pointer (XLogRecPtr) is flushed to disk, but the actual changes elsewhere in the page aren't, you're in trouble. WAL replay will look at the WAL pointer, and think that the page doesn't need to be replayed, while other half of the update is still missing. Hint bits are different. We're only updating a single bit, and it doesn't matter from correctness point of view whether the hint bit update hits the disk or not. But what would spell trouble is if the disk controller/whatever garbles the whole sector, IOW changes something else than the changed bit, while doing the update. In fact, if the tuple's creating transaction has aborted, then the tuple can be vacuumed right there and then before it is even written. Not if you have any indexes on the table. To vacuum, you'll have to scan all indexes to remove pointers to the tuple. Ah. Well, would that be so expensive? After all, someone has to do it eventually, and these are index entries that have only just been added anyway. Scanning all indexes? Depends on your table of course, but yes it would be expensive in general. An alternative would be to build a "list of changes" in the WAL without actually changing the underlying index at all. When reading the index, you would read the "list" first (which would be in memory, and in an efficient-to-search structure), then read the original index and add the two. Then when checkpointing, vet all the changes against known aborted transactions before making all the changes to the index together. This is likely to speed up index writes quite a bit, and also allow you to effectively vacuum aborted tuples before they get written to the disc. There's not much point optimizing something that only helps with aborted transactions. The general problem with any idea that involves keeping a list of changes made in a transaction is that that list will grow big during bulk loads, so you'll have to overflow to disk or abandon the list approach. Which means that it won't help with bulk loads. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
Craig James writes: > Matthew Wakeling wrote: >> Probably of more use are some of the other settings: >> >> -m reserved-blocks-percentage - this reserves a portion of the filesystem >> that only root can write to. If root has no need for it, you can kill >> this by setting it to zero. The default is for 5% of the disc to be >> wasted. > > This is not a good idea. The 5% is NOT reserved for root's > use, but rather is to prevent severe file fragmentation. As Also, IIRC when PG writes data up to a full filesystem, postmaster won't be able to then restart if the filesystem is still full (it needs some free disk space for its startup). Or maybe this has been fixed in recent versions? -- Guillaume Cottenceau -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
Matthew Wakeling wrote: Probably of more use are some of the other settings: -m reserved-blocks-percentage - this reserves a portion of the filesystem that only root can write to. If root has no need for it, you can kill this by setting it to zero. The default is for 5% of the disc to be wasted. This is not a good idea. The 5% is NOT reserved for root's use, but rather is to prevent severe file fragmentation. As the disk gets full, the remaining empty spaces tend to be small spaces scattered all over the disk, meaning that even for modest-sized files, the kernel can't allocate contiguous disk blocks. If you reduce this restriction to 0%, you are virtually guaranteed poor performance when you fill up your disk, since those files that are allocated last will be massively fragmented. Worse, the fragmented files that you create remain fragmented even if you clean up to get back below the 95% mark. If Postgres happened to insert a lot of data on a 99% full file system, those blocks could be spread all over the place, and they'd stay that way forever, even after you cleared some space. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Matthew Wakeling <[EMAIL PROTECTED]> writes: > Hmm. That problem is what WAL full-page-writes is meant to handle, isn't > it? So basically, if you're telling people that WAL full-page-writes is > safer than partial WAL, because it avoids updating pages in-place, then > you shouldn't be updating pages in-place for the hint bits either. You > can't win! This argument ignores the nature of the data change. With a hint-bit update, no data is being shuffled around, so there is no danger from a partial page write. A disk that leaves an individual sector corrupt would be a problem, but I don't think that's a huge risk. Keep in mind that disks aren't designed to just stop dead when power dies --- they are made to be able to park their heads before the juice is entirely gone. I think it's reasonable to assume they'll finish writing the sector in progress before they start parking. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Greg Smith escribió: > On Thu, 15 May 2008, Pavan Deolasee wrote: > >> I had suggested in the past that whenever we set hint bits for a tuple, >> we should check all other tuples in the page and set their hint bits >> too to avoid multiple writes of the same page. I guess the idea got >> rejected because of lack of benchmarks to prove the benefit. > > From glancing at http://www.postgresql.org/docs/faqs.TODO.html I got the > impression the idea was to have the background writer get involved to > help with this particular situation. The problem is that the bgwriter does not understand about the content of the pages it is writing -- they're opaque pages for all it knows. So it cannot touch the hint bits. I agree with Pavan that it's likely that setting hint bits in batches instead of just for the tuple being examined is a benefit. However, it's perhaps not so good to be doing it in a foreground process, because you're imposing extra cost to the client queries which we want to be as fast as possible. Perhaps the thing to do is have a "database-local bgwriter" which would scan pages and do this kind of change ... a different kind of process to be launched by autovacuum perhaps. If we had the bitmask in a separate map fork, this could be cheap. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Update performance degrades over time
On Wed, May 14, 2008 at 6:31 PM, Subbiah Stalin-XCGF84 <[EMAIL PROTECTED]> wrote: > Hi All, > > We are doing some load tests with our application running postgres 8.2.4. At > times we see updates on a table taking longer (around > 11-16secs) than expected sub-second response time. The table in question is > getting updated constantly through the load tests. In checking the table > size including indexes, they seem to be bloated got it confirmed after > recreating it (stats below). We have autovacuum enabled with default > parameters. I thought autovaccum would avoid bloating issues but looks like > its not aggressive enough. Wondering if table/index bloating is causing > update slowness in over a period of time. Any ideas how to troubleshoot this > further. Sometimes it is necessary to not only VACUUM, but also REINDEX. If your update changes an indexed column to a new, distinct value, you can easily get index bloat. Also, you should check to see if you have any old, open transactions on the same instance. If you do, it's possible that VACUUM will have no beneficial effect. -jwb -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Matthew Wakeling wrote: On Thu, 15 May 2008, Luke Lonergan wrote: ...HINT bit optimization, but avoids this whole ³write the data, write it to the log also, then write it again just for good measure² ... The hint data will be four bits per tuple plus overheads, so it could be made very compact, and therefore likely to stay in the cache fairly well. Does it seem like these HINT bits would be good candidates to move off to map forks similar to how the visibility map stuff will be handled? Since (if I understand right) only the hint bits change during the select(*) it seems a lot less write-IO would happen if such a map were updated rather than the data pages themselves. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
On Thu, 15 May 2008, Heikki Linnakangas wrote: > Is it really safe to update the hint bits in place? If there is a > power cut in the middle of writing a block, is there a guarantee from > the disc that the block will never be garbled? Don't know, to be honest. We've never seen any reports of corrupted data that would suggest such a problem, but it doesn't seem impossible to me that some exotic storage system might do that. Hmm. That problem is what WAL full-page-writes is meant to handle, isn't it? So basically, if you're telling people that WAL full-page-writes is safer than partial WAL, because it avoids updating pages in-place, then you shouldn't be updating pages in-place for the hint bits either. You can't win! In fact, if the tuple's creating transaction has aborted, then the tuple can be vacuumed right there and then before it is even written. Not if you have any indexes on the table. To vacuum, you'll have to scan all indexes to remove pointers to the tuple. Ah. Well, would that be so expensive? After all, someone has to do it eventually, and these are index entries that have only just been added anyway. I can understand index updating being a bit messy in the middle of a checkpoint though, as you would have to write the update to the WAL, which you are checkpointing... So, I don't know exactly how the WAL updates to indexes work, but my guess is that it has been implemented as "write the blocks that we would change to the WAL". The problem with this is that all the changes to the index are done individually, so there's no easy way to "undo" one of them later on when you find out that the transaction has been aborted during the checkpoint. An alternative would be to build a "list of changes" in the WAL without actually changing the underlying index at all. When reading the index, you would read the "list" first (which would be in memory, and in an efficient-to-search structure), then read the original index and add the two. Then when checkpointing, vet all the changes against known aborted transactions before making all the changes to the index together. This is likely to speed up index writes quite a bit, and also allow you to effectively vacuum aborted tuples before they get written to the disc. Matthew -- Vacuums are nothings. We only mention them to let them know we know they're there. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
On Thursday 15 May 2008 03:02:19 Tom Lane wrote: > "Jan de Visser" <[EMAIL PROTECTED]> writes: > > Obviously, this issue is tied to the slow count(*) one, as I found out > > the hard way. Consider the following scenario: > > * Insert row > > * Update that row a couple of times > > * Rinse and repeat many times > > > > Now somewhere during that cycle, do a select count(*) just to see > > where you are. You will be appalled by how slow that is, due to not > > only the usual 'slow count(*)' reasons. This whole hint bit business > > makes it even worse, as demonstrated by the fact that running a vacuum > > before the count(*) makes the latter noticably faster. > > Uh, well, you can't blame that entirely on hint-bit updates. The vacuum > has simply *removed* two-thirds of the rows in the system, resulting in > a large drop in the number of rows that the select even has to look at. > > It's certainly true that hint-bit updates cost something, but > quantifying how much isn't easy. The off-the-cuff answer is to do the > select count(*) twice and see how much cheaper the second one is. But > there are two big holes in that answer: the first is the possible cache > effects from having already read in the pages, and the second is that > the follow-up scan gets to avoid the visits to pg_clog that the first > scan had to make (which after all is the point of the hint bits). > > I don't know any easy way to disambiguate the three effects that are at > work here. But blaming it all on the costs of writing out hint-bit > updates is wrong. > > regards, tom lane True. But it still contributes to the fact that queries sometimes behave in a non-deterministic way, which IMHO is the major annoyance when starting to work with pgsql. And contrary to other causes (vacuum, checkpoints) this is woefully underdocumented. jan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
On Thu, 15 May 2008, Luke Lonergan wrote: BTW we¹ve removed HINT bit checking in Greenplum DB and improved the visibility caching which was enough to provide performance at the same level as with the HINT bit optimization, but avoids this whole ³write the data, write it to the log also, then write it again just for good measure² behavior. This sounds like a good option. I believe I suggested this a few months ago, however it was rejected because in the worst case (when the hints are not cached), if you're doing an index scan, you can do twice the number of seeks as before. http://archives.postgresql.org/pgsql-performance/2007-12/msg00217.php The hint data will be four bits per tuple plus overheads, so it could be made very compact, and therefore likely to stay in the cache fairly well. Each tuple fetched would have to be spaced really far apart in the database table in order to exhibit the worst case, because fetching a page of hint cache will cause 64kB or so of disc to appear in the disc's read-ahead buffer, which will be equivalent to 128MB worth of database table (assuming eight tuples per block and no overhead). As soon as you access another tuple in the same 128MB bracket, you'll hit the disc read-ahead buffer for the hints. On balance, to me it still seems like a good option. Matthew -- Those who do not understand Unix are condemned to reinvent it, poorly. -- Henry Spencer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Matthew Wakeling wrote: Is it really safe to update the hint bits in place? If there is a power cut in the middle of writing a block, is there a guarantee from the disc that the block will never be garbled? Don't know, to be honest. We've never seen any reports of corrupted data that would suggest such a problem, but it doesn't seem impossible to me that some exotic storage system might do that. Is there a way to make a shortcut and have the hint bits written the first time the data is written to the table? One piece of obvious low-hanging fruit would be to enhance step five above, so that the bgwriter or checkpoint that writes the data to the database table checks the pg_clog and writes the correct hint bits. Yep, that's an idea that's been suggested before. In fact, I seem to remember a patch to do just that. Don't remember what happened to it, In fact, if the tuple's creating transaction has aborted, then the tuple can be vacuumed right there and then before it is even written. Not if you have any indexes on the table. To vacuum, you'll have to scan all indexes to remove pointers to the tuple. However, this idea does not deal well with bulk data loads, where the data is checkpointed before transaction is committed or aborted. Yep, that's the killer :-(. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
On Wed, 14 May 2008, Alvaro Herrera wrote: Hint bits are used to mark tuples as created and/or deleted by transactions that are know committed or aborted. To determine the visibility of a tuple without such bits set, you need to consult pg_clog and possibly pg_subtrans, so it is an expensive check. So, as I understand it, Postgres works like this: 1. You begin a transaction. Postgres writes an entry into pg_clog. 2. You write some tuples. Postgres writes them to the WAL, but doesn't bother fsyncing. 3. At some point, the bgwriter or a checkpoint may write the tuples to the database tables, and fsync the lot. 4. You commit the transaction. Postgres alters pg_clog again, writes that to the WAL, and fsyncs the WAL. 5. If the tuples hadn't already made it to the database tables, then a checkpoint or bgwriter will do it later on, and fsync the lot. 6. You read the tuples. Postgres reads them from the database table, looks in pg_clog, notices that the transaction has been committed, and writes the tuples to the database table again with the hint bits set. This write is not WAL protected, and is not fsynced. This seems like a good architecture, with some cool characteristics, mainly that at no point does Postgres have to hold vast quantities of data in memory. I have two questions though: Is it really safe to update the hint bits in place? If there is a power cut in the middle of writing a block, is there a guarantee from the disc that the block will never be garbled? Is there a way to make a shortcut and have the hint bits written the first time the data is written to the table? One piece of obvious low-hanging fruit would be to enhance step five above, so that the bgwriter or checkpoint that writes the data to the database table checks the pg_clog and writes the correct hint bits. In fact, if the tuple's creating transaction has aborted, then the tuple can be vacuumed right there and then before it is even written. For OLTP, almost all the hint bits will be written first time, and also the set of transactions that will be looked up in the pg_clog will be small (the set of transactions that were active since the last checkpoint), so its cache coherency will be good. However, this idea does not deal well with bulk data loads, where the data is checkpointed before transaction is committed or aborted. Matthew -- Now, you would have thought these coefficients would be integers, given that we're working out integer results. Using a fraction would seem really stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to use complex numbers.-- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
On Thu, 15 May 2008, [EMAIL PROTECTED] wrote: IIRC postgres likes to do 1M/file, which isn't very largeas far as the -T setting goes. ITYF it's actually 1GB/file. think twice about this. ext2/3 get slow when they fill up (they have fragmentation problems when free space gets too small), this 5% that only root can use also serves as a buffer against that as well. It makes sense to me that the usage pattern of Postgres would be much less susceptible to causing fragmentation than normal filesystem usage. Has anyone actually tested this and found out? Matthew -- Isn't "Microsoft Works" something of a contradiction? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
On Thu, 15 May 2008, Matthew Wakeling wrote: On Thu, 15 May 2008, Philippe Amelant wrote: using mkfs.ext3 I can use "-T" to tune the filesytem mkfs.ext3 -T fs_type ... fs_type are in /etc/mke2fs.conf (on debian) If you look at that file, you'd see that tuning really doesn't change that much. In fact, the only thing it does change (if you avoid "small" and "floppy") is the number of inodes available in the filesystem. Since Postgres tends to produce few large files, you don't need that many inodes, so the "largefile" option may be best. However, note that the number of inodes is a hard limit of the filesystem - if you try to create more files on the filesystem than there are available inodes, then you will get an out of space error even if the filesystem has space left. The only real benefit of having not many inodes is that you waste a little less space, so many admins are pretty generous with this setting. IIRC postgres likes to do 1M/file, which isn't very largeas far as the -T setting goes. Probably of more use are some of the other settings: -m reserved-blocks-percentage - this reserves a portion of the filesystem that only root can write to. If root has no need for it, you can kill this by setting it to zero. The default is for 5% of the disc to be wasted. think twice about this. ext2/3 get slow when they fill up (they have fragmentation problems when free space gets too small), this 5% that only root can use also serves as a buffer against that as well. -j turns the filesystem into ext3 instead of ext2 - many people say that for Postgres you shouldn't do this, as ext2 is faster. for the partition with the WAL on it you may as well do ext2 (the WAL is written synchronously and sequentially so the journal doesn't help you), but for the data partition you may benifit from the journal. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
On Thu, 15 May 2008, Philippe Amelant wrote: using mkfs.ext3 I can use "-T" to tune the filesytem mkfs.ext3 -T fs_type ... fs_type are in /etc/mke2fs.conf (on debian) If you look at that file, you'd see that tuning really doesn't change that much. In fact, the only thing it does change (if you avoid "small" and "floppy") is the number of inodes available in the filesystem. Since Postgres tends to produce few large files, you don't need that many inodes, so the "largefile" option may be best. However, note that the number of inodes is a hard limit of the filesystem - if you try to create more files on the filesystem than there are available inodes, then you will get an out of space error even if the filesystem has space left. The only real benefit of having not many inodes is that you waste a little less space, so many admins are pretty generous with this setting. Probably of more use are some of the other settings: -m reserved-blocks-percentage - this reserves a portion of the filesystem that only root can write to. If root has no need for it, you can kill this by setting it to zero. The default is for 5% of the disc to be wasted. -j turns the filesystem into ext3 instead of ext2 - many people say that for Postgres you shouldn't do this, as ext2 is faster. Matthew -- The surest protection against temptation is cowardice. -- Mark Twain -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] which ext3 fs type should I use for postgresql
Hi all, using mkfs.ext3 I can use "-T" to tune the filesytem mkfs.ext3 -T fs_type ... fs_type are in /etc/mke2fs.conf (on debian) is there a recommended setting for this parameter ??? thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Greg Smith <[EMAIL PROTECTED]> writes: > ... To put that another way: if I've run a manual vacuum, is it true > that it will have updated all the hint bits to XMIN_COMMITTED for all the > tuples that were all done when the vacuum started? Any examination whatsoever of a tuple --- whether by vacuum or any ordinary DML operation --- will update its hint bits to match the commit/abort status of the inserting/deleting transaction(s) as of the instant of the examination. Your statement above is true but is weaker than reality. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
"Jan de Visser" <[EMAIL PROTECTED]> writes: > Obviously, this issue is tied to the slow count(*) one, as I found out > the hard way. Consider the following scenario: > * Insert row > * Update that row a couple of times > * Rinse and repeat many times > Now somewhere during that cycle, do a select count(*) just to see > where you are. You will be appalled by how slow that is, due to not > only the usual 'slow count(*)' reasons. This whole hint bit business > makes it even worse, as demonstrated by the fact that running a vacuum > before the count(*) makes the latter noticably faster. Uh, well, you can't blame that entirely on hint-bit updates. The vacuum has simply *removed* two-thirds of the rows in the system, resulting in a large drop in the number of rows that the select even has to look at. It's certainly true that hint-bit updates cost something, but quantifying how much isn't easy. The off-the-cuff answer is to do the select count(*) twice and see how much cheaper the second one is. But there are two big holes in that answer: the first is the possible cache effects from having already read in the pages, and the second is that the follow-up scan gets to avoid the visits to pg_clog that the first scan had to make (which after all is the point of the hint bits). I don't know any easy way to disambiguate the three effects that are at work here. But blaming it all on the costs of writing out hint-bit updates is wrong. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance