Re: [PERFORM] SSD + RAID
Bruce Momjian wrote: I always assumed SCSI disks had a write-through cache and therefore didn't need a drive cache flush comment. There's more detail on all this mess at http://wiki.postgresql.org/wiki/SCSI_vs._IDE/SATA_Disks and it includes this perception, which I've recently come to believe isn't actually correct anymore. Like the IDE crowd, it looks like one day somebody said "hey, we lose every write heavy benchmark badly because we only have a write-through cache", and that principle got lost along the wayside. What has been true, and I'm staring to think this is what we've all been observing rather than a write-through cache, is that the proper cache flushing commands have been there in working form for so much longer that it's more likely your SCSI driver and drive do the right thing if the filesystem asks them to. SCSI SYNCHRONIZE CACHE has a much longer and prouder history than IDE's FLUSH_CACHE and SATA's FLUSH_CACHE_EXT. It's also worth noting that many current SAS drives, the current SCSI incarnation, are basically SATA drives with a bridge chipset stuck onto them, or with just the interface board swapped out. This one reason why top-end SAS capacities lag behind consumer SATA drives. They use the consumers as beta testers to get the really fundamental firmware issues sorted out, and once things are stable they start stamping out the version with the SAS interface instead. (Note that there's a parallel manufacturing approach that makes much smaller SAS drives, the 2.5" server models or those at higher RPMs, that doesn't go through this path. Those are also the really expensive models, due to economy of scale issues). The idea that these would have fundamentally different write cache behavior doesn't really follow from that development model. At this point, there are only two common differences between "consumer" and "enterprise" hard drives of the same size and RPM when there are directly matching ones: 1) You might get SAS instead of SATA as the interface, which provides the more mature command set I was talking about above--and therefore may give you a sane write-back cache with proper flushing, which is all the database really expects. 2) The timeouts when there's a read/write problem are tuned down in the enterprise version, to be more compatible with RAID setups where you want to push the drive off-line when this happens rather than presuming you can fix it. Consumers would prefer that the drive spent a lot of time doing heroics to try and save their sole copy of the apparently missing data. You might get a slightly higher grade of parts if you're lucky too; I wouldn't count on it though. That seems to be saved for the high RPM or smaller size drives only. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Greg Smith wrote: > Ron Mayer wrote: > > Linux apparently sends FLUSH_CACHE commands to IDE drives in the > > exact sample places it sends SYNCHRONIZE CACHE commands to SCSI > > drives[2]. > > [2] http://hardware.slashdot.org/comments.pl?sid=149349&cid=12519114 > > > > Well, that's old enough to not even be completely right anymore about > SATA disks and kernels. It's FLUSH_CACHE_EXT that's been added to ATA-6 > to do the right thing on modern drives and that gets used nowadays, and > that doesn't necessarily do so on most of the SSDs out there; all of > which Bruce's recent doc additions now talk about correctly. > > There's this one specific area we know about that the most popular > systems tend to get really wrong all the time; that's got the > appropriate warning now with the right magic keywords that people can > look into it more if motivated. While it would be nice to get super > thorough and document everything, I think there's already more docs in > there than this project would prefer to have to maintain in this area. > > Are we going to get into IDE, SATA, SCSI, SAS, FC, and iSCSI? If the > idea is to be complete that's where this would go. I don't know that > the documentation needs to address every possible way every possible > filesystem can be flushed. The bottom line is that the reason we have so much detailed documentation about this is that mostly only database folks care about such issues, so we end up having to research and document this ourselves --- I don't see any alternatives. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Ron Mayer wrote: > Bruce Momjian wrote: > > Greg Smith wrote: > >> Bruce Momjian wrote: > >>> I have added documentation about the ATAPI drive flush command, and the > >> > >> If one of us goes back into that section one day to edit again it might > >> be worth mentioning that FLUSH CACHE EXT is the actual ATAPI-6 command > >> that a drive needs to support properly. I wouldn't bother with another > >> doc edit commit just for that specific part though, pretty obscure. > > > > That setting name was not easy to find so I added it to the > > documentation. > > If we're spelling out specific IDE commands, it might be worth > noting that the corresponding SCSI command is "SYNCHRONIZE CACHE"[1]. > > > Linux apparently sends FLUSH_CACHE commands to IDE drives in the > exact sample places it sends SYNCHRONIZE CACHE commands to SCSI > drives[2]. > > It seems that the same file systems, SW raid layers, > virtualization platforms, and kernels that have a problem > sending FLUSH CACHE commands to SATA drives have he same exact > same problems sending SYNCHRONIZE CACHE commands to SCSI drives. > With the exact same effect of not getting writes all the way > through disk caches. I always assumed SCSI disks had a write-through cache and therefore didn't need a drive cache flush comment. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query slowing down significantly??
Rainer Pruy wrote: Thanks for the hint. I should have been considering that in the first place. (But the obvious is easily left unrecognised..) The prepared statement gives: QUERY PLAN --- Nested Loop (cost=0.00..25.18 rows=2 width=175) (actual time=36.116..49.998 rows=1 loops=1) -> Index Scan using x_context_01 on context c (cost=0.00..10.76 rows=2 width=67) (actual time=0.029..6.947 rows=12706 loops=1) Index Cond: ((contextid)::text = $1) -> Index Scan using x_fk_context_hierarchy_02 on context_hierarchy h (cost=0.00..7.20 rows=1 width=108) (actual time=0.003..0.003 rows=0 loops=12706) Index Cond: (h.contextidx = c.idx) Filter: (((h.hierarchyname)::text = $2) AND (h.parentidx = $3)) Total runtime: 50.064 ms (7 rows) And that is quite a bad plan given the current distribution of values. Another approach might be to rewrite recursion into your hierarchy with the in 8.4 new WITH RECURSIVE option in sql queries. The possible gains there are way beyond anything you can accomplish with optimizing recursive functions. Regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] partition pruning
When I use intervals in my query e.g col1 between current_timestamp - interval '10 days' and current_timestamp...the optimizer checks ALL partitions whereas if I use col1 between 2 hardcoded dates..only the applicable partitions are scanned. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query slowing down significantly??
I'm already at it It is a Java app, using jdbc, but through a proprietary persistence framework. I'm just busy evaluating the effects on the app of prohibiting prepared statements via jdbc. If this is not worthwhile, I'm bound to some expensive reorganizations, sigh. Nevertheless, thanks for your help in reminding me about obvious use of prepared statements. Rainer PS: I've just read the thread on "Avoiding bad prepared-statement plans". Very interesting. Will track this... Am 01.03.2010 19:15, wrote Tom Lane: > Rainer Pruy writes: >> The prepared statement gives: >> ... >> And that is quite a bad plan given the current distribution of values. > > Yeah. The planner really needs to know the actual parameter values in > order to pick the best plan for this case. > > One thing that you might be able to do to avoid giving up on prepared > statements entirely is to use an "unnamed" rather than named prepared > statement here. That will lead to the query plan being prepared only > when the parameter values are made available, rather than in advance. > It'd depend on what client library you're using whether this is a simple > change or not. > > regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query slowing down significantly??
Rainer Pruy writes: > The prepared statement gives: > ... > And that is quite a bad plan given the current distribution of values. Yeah. The planner really needs to know the actual parameter values in order to pick the best plan for this case. One thing that you might be able to do to avoid giving up on prepared statements entirely is to use an "unnamed" rather than named prepared statement here. That will lead to the query plan being prepared only when the parameter values are made available, rather than in advance. It'd depend on what client library you're using whether this is a simple change or not. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query slowing down significantly??
Thanks for the hint. I should have been considering that in the first place. (But the obvious is easily left unrecognised..) The prepared statement gives: QUERY PLAN --- Nested Loop (cost=0.00..25.18 rows=2 width=175) (actual time=36.116..49.998 rows=1 loops=1) -> Index Scan using x_context_01 on context c (cost=0.00..10.76 rows=2 width=67) (actual time=0.029..6.947 rows=12706 loops=1) Index Cond: ((contextid)::text = $1) -> Index Scan using x_fk_context_hierarchy_02 on context_hierarchy h (cost=0.00..7.20 rows=1 width=108) (actual time=0.003..0.003 rows=0 loops=12706) Index Cond: (h.contextidx = c.idx) Filter: (((h.hierarchyname)::text = $2) AND (h.parentidx = $3)) Total runtime: 50.064 ms (7 rows) And that is quite a bad plan given the current distribution of values. Regards, Rainer Am 01.03.2010 17:15, schrieb Tom Lane: > Rainer Pruy writes: >> Normally the following Query behaves well: > >> select c.*, h.* >> from Context c, Context_Hierarchy h >> where c.Idx = h.ContextIdx and c.ContextId='testID' and >> h.HierarchyName='InsuranceHierarchy' and h.ParentIdx=49292395 >> ; >> QUERY >> PLAN >> -- >> Nested Loop (cost=0.00..43.57 rows=4 width=175) (actual time=0.291..0.293 >> rows=1 loops=1) >>-> Index Scan using uk_context_hierarchy_01 on context_hierarchy h >> (cost=0.00..14.76 rows=4 width=108) (actual time=0.169..0.169 >> rows=1 loops=1) >> Index Cond: (((hierarchyname)::text = 'InsuranceHierarchy'::text) >> AND (parentidx = 49292395)) >>-> Index Scan using pk_context on context c (cost=0.00..7.20 rows=1 >> width=67) (actual time=0.110..0.111 rows=1 loops=1) >> Index Cond: (c.idx = h.contextidx) >> Filter: ((c.contextid)::text = 'testID'::text) >> Total runtime: 0.388 ms >> (7 rows) > >> (From a freshly started PG) > >> However during a long term read-only transaction (actually just bout 15min) >> (the transaction is issuing about 10k-20k of such queries among others) >> PG is logging a number of the following: > >> Mar 1 09:58:09 gaia postgres[20126]: [25-1] LOG: 0: duration: 343.663 >> ms execute S_5: select c.*, h.Idx as h_Idx, h.WbuIdx as >> h_WbuIdx, h.OrigWbuIdx as h_OrigWbuIdx, h.Ts as h_Ts, h. >> UserId as h_UserId, h.ParentIdx as h_ParentIdx, h.ContextIdx as >> h_ContextIdx, h.HierarchyName as h_HierarchyName, h.HierarchyPath as >> h_HierarchyPath from Context c, Context_Hierarchy h wher >> e c.Idx = h.ContextIdx and c.ContextId=$1 and h.HierarchyName=$2 and >> h.ParentIdx=$3 >> Mar 1 09:58:09 gaia postgres[20126]: [25-2] DETAIL: parameters: $1 = >> 'testID', $2 = 'InsuranceHierarchy', $3 = '49292395' >> Mar 1 09:58:09 gaia postgres[20126]: [25-3] LOCATION: >> exec_execute_message, postgres.c:1988 > > That's not the same query at all, and it may not be getting the same > plan. What you need to do to check the plan is to try PREPARE-ing > and EXPLAIN EXECUTE-ing the query with the same parameter symbols > as are actually used in the application-issued query. > > You might be entertained by the recent thread on -hackers about > "Avoiding bad prepared-statement plans" ... > > regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query slowing down significantly??
Rainer Pruy writes: > Normally the following Query behaves well: > select c.*, h.* > from Context c, Context_Hierarchy h > where c.Idx = h.ContextIdx and c.ContextId='testID' and > h.HierarchyName='InsuranceHierarchy' and h.ParentIdx=49292395 > ; > QUERY > PLAN > -- > Nested Loop (cost=0.00..43.57 rows=4 width=175) (actual time=0.291..0.293 > rows=1 loops=1) >-> Index Scan using uk_context_hierarchy_01 on context_hierarchy h > (cost=0.00..14.76 rows=4 width=108) (actual time=0.169..0.169 > rows=1 loops=1) > Index Cond: (((hierarchyname)::text = 'InsuranceHierarchy'::text) > AND (parentidx = 49292395)) >-> Index Scan using pk_context on context c (cost=0.00..7.20 rows=1 > width=67) (actual time=0.110..0.111 rows=1 loops=1) > Index Cond: (c.idx = h.contextidx) > Filter: ((c.contextid)::text = 'testID'::text) > Total runtime: 0.388 ms > (7 rows) > (From a freshly started PG) > However during a long term read-only transaction (actually just bout 15min) > (the transaction is issuing about 10k-20k of such queries among others) > PG is logging a number of the following: > Mar 1 09:58:09 gaia postgres[20126]: [25-1] LOG: 0: duration: 343.663 > ms execute S_5: select c.*, h.Idx as h_Idx, h.WbuIdx as > h_WbuIdx, h.OrigWbuIdx as h_OrigWbuIdx, h.Ts as h_Ts, h. > UserId as h_UserId, h.ParentIdx as h_ParentIdx, h.ContextIdx as h_ContextIdx, > h.HierarchyName as h_HierarchyName, h.HierarchyPath as > h_HierarchyPath from Context c, Context_Hierarchy h wher > e c.Idx = h.ContextIdx and c.ContextId=$1 and h.HierarchyName=$2 and > h.ParentIdx=$3 > Mar 1 09:58:09 gaia postgres[20126]: [25-2] DETAIL: parameters: $1 = > 'testID', $2 = 'InsuranceHierarchy', $3 = '49292395' > Mar 1 09:58:09 gaia postgres[20126]: [25-3] LOCATION: exec_execute_message, > postgres.c:1988 That's not the same query at all, and it may not be getting the same plan. What you need to do to check the plan is to try PREPARE-ing and EXPLAIN EXECUTE-ing the query with the same parameter symbols as are actually used in the application-issued query. You might be entertained by the recent thread on -hackers about "Avoiding bad prepared-statement plans" ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Autovacuum Tuning advice
Sorry, this is a “black box” application, I am bound by what they give me as far as table layout, but I fully understand the rationale. I believe this application spent its beginnings with Oracle, which explains the blanket use of VARCHAR. From: Grzegorz Jaśkiewicz [mailto:gryz...@gmail.com] Sent: Monday, March 01, 2010 6:51 AM To: Plugge, Joe R. Cc: Scott Marlowe; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Autovacuum Tuning advice storing all fields as varchar surely doesn't make: - indicies small, - the thing fly, - tables small. ...
[PERFORM] Query slowing down significantly??
Hi all, I'm quite puzzled by the following observation. The behaviour is observed on a production system (Linux, PG 8.3.5) and also on a test system (NetBSD 5.0.2, PG 8.4.2). Normally the following Query behaves well: select c.*, h.* from Context c, Context_Hierarchy h where c.Idx = h.ContextIdx and c.ContextId='testID' and h.HierarchyName='InsuranceHierarchy' and h.ParentIdx=49292395 ; QUERY PLAN -- Nested Loop (cost=0.00..43.57 rows=4 width=175) (actual time=0.291..0.293 rows=1 loops=1) -> Index Scan using uk_context_hierarchy_01 on context_hierarchy h (cost=0.00..14.76 rows=4 width=108) (actual time=0.169..0.169 rows=1 loops=1) Index Cond: (((hierarchyname)::text = 'InsuranceHierarchy'::text) AND (parentidx = 49292395)) -> Index Scan using pk_context on context c (cost=0.00..7.20 rows=1 width=67) (actual time=0.110..0.111 rows=1 loops=1) Index Cond: (c.idx = h.contextidx) Filter: ((c.contextid)::text = 'testID'::text) Total runtime: 0.388 ms (7 rows) (From a freshly started PG) However during a long term read-only transaction (actually just bout 15min) (the transaction is issuing about 10k-20k of such queries among others) PG is logging a number of the following: Mar 1 09:58:09 gaia postgres[20126]: [25-1] LOG: 0: duration: 343.663 ms execute S_5: select c.*, h.Idx as h_Idx, h.WbuIdx as h_WbuIdx, h.OrigWbuIdx as h_OrigWbuIdx, h.Ts as h_Ts, h. UserId as h_UserId, h.ParentIdx as h_ParentIdx, h.ContextIdx as h_ContextIdx, h.HierarchyName as h_HierarchyName, h.HierarchyPath as h_HierarchyPath from Context c, Context_Hierarchy h wher e c.Idx = h.ContextIdx and c.ContextId=$1 and h.HierarchyName=$2 and h.ParentIdx=$3 Mar 1 09:58:09 gaia postgres[20126]: [25-2] DETAIL: parameters: $1 = 'testID', $2 = 'InsuranceHierarchy', $3 = '49292395' Mar 1 09:58:09 gaia postgres[20126]: [25-3] LOCATION: exec_execute_message, postgres.c:1988 (About 200 in the current case.) This is from the test system. The given transaction was the only activity on the system at that time. While the transaction was still active, I issued the query in parallel yielding the following plan (based on the logged message above): QUERY PLAN -- Nested Loop (cost=0.00..43.57 rows=4 width=175) (actual time=21.809..21.811 rows=1 loops=1) -> Index Scan using uk_context_hierarchy_01 on context_hierarchy h (cost=0.00..14.76 rows=4 width=108) (actual time=21.629..21.629 rows=1 loops=1) Index Cond: (((hierarchyname)::text = 'InsuranceHierarchy'::text) AND (parentidx = 49292395)) -> Index Scan using pk_context on context c (cost=0.00..7.20 rows=1 width=67) (actual time=0.169..0.169 rows=1 loops=1) Index Cond: (c.idx = h.contextidx) Filter: ((c.contextid)::text = 'testID'::text) Total runtime: 22.810 ms (7 rows) This still looks reasonable and is far from the >300ms as logged. All this happens after the read-only transaction was active for a while. Any idea where to look for an explanation? Or what parameters could shed some light on the issue? Regards, Rainer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Autovacuum Tuning advice
storing all fields as varchar surely doesn't make: - indicies small, - the thing fly, - tables small. ...
Re: [PERFORM] Autovacuum Tuning advice
Sorry, additional info: OS is Red Hat Enterprise Linux ES release 4 (Nahant Update 5) DISK - IBM DS4700 Array - 31 drives and 1 hot spare - RAID10 - 32MB stripe Sysctl.conf kernel.shmmax=6442450944 kernel.shmall=1887436 kernel.msgmni=1024 kernel.msgmnb=65536 kernel.msgmax=65536 kernel.sem=250 256000 32 1024 Problem Child table: This table is partitioned so that after the data has rolled past 30 days, I can just drop the table. Table "public.log_events_y2010m02" Column | Type | Modifiers ---++--- callseq | character varying(32) | not null eventid | character varying(40) | not null msgseq| character varying(32) | not null eventdate | timestamp(0) without time zone | not null hollyid | character varying(20) | ownerid | character varying(60) | spownerid | character varying(60) | applicationid | character varying(60) | clid | character varying(40) | dnis | character varying(40) | param | character varying(2000)| docid | character varying(40) | Indexes: "log_events_y2010m02_pk" PRIMARY KEY, btree (callseq, msgseq) "loev_eventid_idx_y2010m02" btree (eventid) "loev_ownerid_cidx_y2010m02" btree (ownerid, spownerid) Check constraints: "log_events_y2010m02_eventdate_check" CHECK (eventdate >= '2010-02-01'::date AND eventdate < '2010-03-01'::date) Inherits: log_events Parent Table: Table "public.log_events" Column | Type | Modifiers ---++--- callseq | character varying(32) | not null eventid | character varying(40) | not null msgseq| character varying(32) | not null eventdate | timestamp(0) without time zone | not null hollyid | character varying(20) | ownerid | character varying(60) | spownerid | character varying(60) | applicationid | character varying(60) | clid | character varying(40) | dnis | character varying(40) | param | character varying(2000)| docid | character varying(40) | Triggers: insert_log_events_trigger BEFORE INSERT ON log_events FOR EACH ROW EXECUTE PROCEDURE insert_log_events() schemaname | tablename| size_pretty | total_size_pretty ++-+--- public | log_events_y2010m02| 356 GB | 610 GB -Original Message- From: Scott Marlowe [mailto:scott.marl...@gmail.com] Sent: Monday, March 01, 2010 12:58 AM To: Plugge, Joe R. Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Autovacuum Tuning advice On Sun, Feb 28, 2010 at 8:09 PM, Plugge, Joe R. wrote: > I have a very busy system that takes about 9 million inserts per day and each > record gets updated at least once after the insert (all for the one same > table), there are other tables that get hit but not as severely. As > suspected I am having a problem with table bloat. Any advice on how to be > more aggressive with autovacuum? I am using 8.4.1. My machine has 4 Intel > Xeon 3000 MHz Processors with 8 GB of Ram. What kind of drive system do you have? That's far more important than CPU and RAM. Let's look at a two pronged attack. 1: What can you maybe do to reduce the number of updates for each row. if you do something like: update row set field1='xyz' where id=4; update row set field2='www' where id=4; And you can combine those updates, that's a big savings. Can you benefit from HOT updates by removing some indexes? Updating indexed fields can cost a fair bit more than updating indexed ones IF you have a < 100% fill factor and therefore free room in each page for a few extra rows. 2: Vacuum tuning. > > Currently I am using only defaults for autovac. This one: > #autovacuum_vacuum_cost_delay = 20ms is very high for a busy system with a powerful io subsystem. I run my production servers with 1ms to 4ms so they can keep up. Lastly there are some settings you can make per table for autovac you can look into (i.e. set cost_delay to 0 for this table), or you can turn off autovac for this one table and then run a regular vac with no cost_delay on it every minute or two. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Autovacuum Tuning advice
Joe wrote: > I have a very busy system that takes about 9 million inserts per day and each > record gets > updated at least once after the insert (all for the one same table), there > are other tables that > get hit but not as severely. As suspected I am having a problem with table > bloat. Any advice > on how to be more aggressive with autovacuum? I am using 8.4.1. My machine > has 4 Intel > Xeon 3000 MHz Processors with 8 GB of Ram. > > Currently I am using only defaults for autovac. > > shared_buffers = 768MB # min 128kB > work_mem = 1MB # min 64kB > maintenance_work_mem = 384MB Operating system ? Any messages in logs ? Greg W. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance