Re: File systems (RE: [PERFORM] Sanity check requested)
"Nick Fankhauser" <[EMAIL PROTECTED]> writes: > I'm departing in three ways from the simple IDE > model that (I presume) the default random page cost of 4 is based on- The > disks are SCSI & RAID and the FS would be different. Actually, the default 4 is based on experiments I did quite awhile back on HPUX (with a SCSI disk) and Linux (with an IDE disk, and a different filesystem). I didn't see too much difference between 'em. RAID might alter the equation, or not. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sanity check requested
> ...About one year ago I considered moving to a journaling file system, but > opted not to because it seems like that's what WAL does for us already. How > does putting a journaling file system under it add more reliability? WAL only works if the WAL files are actually written to disk and can be read off it again. Ext2 has a number of deficiencies which can cause problems with this basic operation (inode corruptions, etc). Journaling does not directly help. signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Sanity check requested
Nick, > ...About one year ago I considered moving to a journaling file system, but > opted not to because it seems like that's what WAL does for us already. How > does putting a journaling file system under it add more reliability? It lets you restart your server quickly after an unexpected power-out. Ext2 is notoriously bad about this. Also, WAL cannot necessarily recover properly if the underlying filesystem is corrupted. > I also guessed that a journaling file system would add overhead because now > a write to the WAL file could itself be deferred and logged elsewhere. You are correct. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Sanity check requested
I'm confused: Ang Chin Han wrote: > We've been using ext3fs for our production systems. (Red Hat Advanced > Server 2.1) Vincent van Leeuwen wrote: > I'd upgrade to a journaling filesystem as soon as possible for > reliability. ...About one year ago I considered moving to a journaling file system, but opted not to because it seems like that's what WAL does for us already. How does putting a journaling file system under it add more reliability? I also guessed that a journaling file system would add overhead because now a write to the WAL file could itself be deferred and logged elsewhere. ...So now I'm really puzzled because folks are weighing in with solid anecdotal evidence saying that I'll get both better reliability and performance. Can someone explain what I'm missing about the concept? -A puzzled Nick ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] index / sequential scan problem
On Fri, 18 Jul 2003, Tom Lane wrote: > >> Adjusting the cpu_tuple_cost to 0.042 got the planner to choose the index. > > > Doesn't sound very good and it will most likely make other queries slower. > > Seems like a reasonable approach to me --- certainly better than setting > random_page_cost to physically nonsensical values. Hehe, just before this letter there was talk about changing random_page_cost. I kind of responed that 0.042 is not a good random page cost. But now of course I can see that it says cpu_tuple_cost :-) Sorry for adding confusion. -- /Dennis ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Yet another slow join query.. [ SOLVED ]
The Types of the join columns were different text vs varchar(100), now its working fine and using a Hash Join Thanks once again. regds mallah. explain analyze select b.state,a.city from data_bank.updated_profiles a join public.city_master b using(city) where source='BRANDING' and a.state is NULL and b.country='India' ; QUERY PLAN --- Hash Join (cost=2806.09..3949.37 rows=28 width=92) (actual time=183.05..326.52 rows=18285 loops=1) Hash Cond: ("outer".city = "inner".city) -> Index Scan using city_master_temp1 on city_master b (cost=0.00..854.87 rows=5603 width=24) (actual time=0.17..45.70 rows=5603 loops=1) Filter: (country = 'India'::character varying) -> Hash (cost=2805.65..2805.65 rows=178 width=68) (actual time=181.74..181.74 rows=0 loops=1) -> Seq Scan on updated_profiles a (cost=0.00..2805.65 rows=178 width=68) (actual time=20.53..149.66 rows=17537 loops=1) Filter: ((source = 'BRANDING'::character varying) AND (state IS NULL)) Total runtime: 348.50 msec (8 rows) > On Fri, 18 Jul 2003, Rajesh Kumar Mallah wrote: > >> Hi All, >> >> data_bank.updated_profiles and public.city_master are small tables >> with 21790 and 49303 records repectively. both have indexes on the >> join column. in first one on (city,source) and in second one on (city) >> >> The query below does not return for long durations > 10 mins. >> >> explain analyze select b.state,a.city from data_bank.updated_profiles >> a join public.city_master b using(city) where source='BRANDING' and >> a.state is NULL and b.country='India' ; >> >> >> simple explain returns below. >> >> ~~ >> >> Nested Loop (cost=0.00..83506.31 rows=14 width=35) >> Join Filter: ("outer".city = ("inner".city)::text) >> -> Seq Scan on updated_profiles a (cost=0.00..1376.39 rows=89 >> width=11) >> Filter: ((source = 'BRANDING'::character varying) AND (state >> IS NULL)) >> -> Index Scan using city_master_temp1 on city_master b >> (cost=0.00..854.87 >> rows=5603 width=24) >> Filter: (country = 'India'::character varying) >> (6 rows) > > How many rows actually meet the filter conditions on updated_profiles > and city_master? Are the two city columns of the same type? - Over 1,00,000 exporters are waiting for your order! Click below to get in touch with leading Indian exporters listed in the premier trade directory Exporters Yellow Pages. http://www.trade-india.com/dyn/gdh/eyp/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sanity check requested
>> > Be sure to mount noatime >> >> I did "chattr -R +A /var/lib/pgsql/data" >> that should do the trick as well or am I wrong? >> > > According to the man page it gives the same effect. > There are a few things you should consider though: > - new files won't be created with the same options (I think), > so you'll have to run this command as a daily cronjob or > something to that effect This would be a really interesting point to know. I will look into this. I think the advantage of "chattr" is that the last access time is still available for the rest of the filesystem. (Of course you could have your own filesystem just for the database stuff, in this case the advantage would be obsolete) regards, Oli ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sanity check requested
On 2003-07-18 18:20:55 +0200, Oliver Scheit wrote: > > Be sure to mount noatime > > I did "chattr -R +A /var/lib/pgsql/data" > that should do the trick as well or am I wrong? > According to the man page it gives the same effect. There are a few things you should consider though: - new files won't be created with the same options (I think), so you'll have to run this command as a daily cronjob or something to that effect - chattr is probably more filesystem-specific than a noatime mount, although this isn't a problem on ext[23] ofcourse Vincent van Leeuwen Media Design - http://www.mediadesign.nl/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Sanity check requested
> Be sure to mount noatime I did "chattr -R +A /var/lib/pgsql/data" that should do the trick as well or am I wrong? regards, Oli ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Yet another slow join query..
On Fri, 18 Jul 2003, Rajesh Kumar Mallah wrote: > Hi All, > > data_bank.updated_profiles and public.city_master are small tables > with 21790 and 49303 records repectively. both have indexes on the join > column. in first one on (city,source) and in second one on (city) > > The query below does not return for long durations > 10 mins. > > explain analyze select b.state,a.city from data_bank.updated_profiles a join > public.city_master b using(city) where source='BRANDING' and a.state is NULL > and b.country='India' ; > > > simple explain returns below. > > ~~ > > Nested Loop (cost=0.00..83506.31 rows=14 width=35) > Join Filter: ("outer".city = ("inner".city)::text) > -> Seq Scan on updated_profiles a (cost=0.00..1376.39 rows=89 width=11) > Filter: ((source = 'BRANDING'::character varying) AND (state IS NULL)) > -> Index Scan using city_master_temp1 on city_master b (cost=0.00..854.87 > rows=5603 width=24) > Filter: (country = 'India'::character varying) > (6 rows) How many rows actually meet the filter conditions on updated_profiles and city_master? Are the two city columns of the same type? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sanity check requested
On 2003-07-17 10:41:35 -0500, Nick Fankhauser wrote: > I'm using ext2. For now, I'll leave this and the OS version alone. If I > I'd upgrade to a journaling filesystem as soon as possible for reliability. Testing in our own environment has shown that PostgreSQL performs best on ext3 (yes, better than XFS, JFS or ReiserFS) with a linux 2.4.21 kernel. Be sure to mount noatime and to create the ext3 partition with the correct stripe size of your RAID array using the '-R stride=foo' option (see man mke2fs). Vincent van Leeuwen Media Design - http://www.mediadesign.nl/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sanity check requested
On Fri, 18 Jul 2003, Ang Chin Han wrote: > Shridhar Daithankar wrote: > > On 17 Jul 2003 at 10:41, Nick Fankhauser wrote: > > > >>I'm using ext2. For now, I'll leave this and the OS version alone. If I > > > > > > I appreciate your approach but it almost proven that ext2 is not the best and > > fastest out there. > > Agreed. Huh? How can journaled file systems hope to outrun a simple unjournaled file system? There's just less overhead for ext2 so it's quicker, it's just not as reliable. I point you to this link from IBM: http://www-124.ibm.com/developerworks/opensource/linuxperf/iozone/iozone.php While ext3 is a clear loser to jfs and rfs, ext2 wins most of the contests against both reiser and jfs. Note that xfs wasn't tested here. But in general, ext2 is quite fast nowadays. > > > IMO, you can safely change that to reiserfs or XFS. Or course, testing is > > always recommended. > > We've been using ext3fs for our production systems. (Red Hat Advanced > Server 2.1) > > And since your (Nick) system is based on Debian, I have done some rough > testing on Debian sarge (testing) (with custom 2.4.20) with ext3fs, > reiserfs and jfs. Can't get XFS going easily on Debian, though. > > I used a single partition mkfs'd with ext3fs, reiserfs and jfs one after > the other on an IDE disk. Ran pgbench and osdb-x0.15-0 on it. > > jfs's has been underperforming for me. Somehow the CPU usage is higher > than the other two. As for ext3fs and reiserfs, I can't detect any > significant difference. So if you're in a hurry, it'll be easier to > convert your ext2 to ext3 (using tune2fs) and use that. Otherwise, it'd > be nice if you could do your own testing, and post it to the list. I would like to see some tests on how they behave on top of large fast RAID arrays, like a 10 disk RAID5 or something. It's likely that on a single IDE drive the most limiting factor is the bandwidth of the drive, whereas on a large array, the limiting factor would likely be the file system code. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
File systems (RE: [PERFORM] Sanity check requested)
Thanks for the suggestions in the FS types- especially the Debian oriented info. I'll start by playing with the memory allocation parameters that I originally listed (seems like they should provide results in a way that is unaffected by the disk IO). Then once I have them at optimal values, move on to trying different file systems. I assume that as I make changes that affect the disk IO performance, I'll then need to do some testing to find new values for the IO cost for the planner- Do you folks have some ballpark numbers to start with for this based on your experience? I'm departing in three ways from the simple IDE model that (I presume) the default random page cost of 4 is based on- The disks are SCSI & RAID and the FS would be different. At this point, I can't think of any better way to test this than simply running my local test suite with various values and recording the wall-clock results. Is there a different approach that might make more sense? (This means that my results will be skewed to my environment, but I'll post them anyway.) I'll post results back to the list as I get to it- It might be a slow process Since I spend about 18 hours of each day keeping the business running, I'll have to cut back on sleep & do this in the other 10 hours. -NF > Shridhar Daithankar wrote: > I appreciate your approach but it almost proven that ext2 is > not the best and fastest out there. > > Agreed. > Ang Chin Han wrote: > We've been using ext3fs for our production systems. (Red Hat Advanced > Server 2.1) > > And since your (Nick) system is based on Debian, I have done some rough > testing on Debian sarge (testing) (with custom 2.4.20) with ext3fs, > reiserfs and jfs. Can't get XFS going easily on Debian, though. > > I used a single partition mkfs'd with ext3fs, reiserfs and jfs one after > the other on an IDE disk. Ran pgbench and osdb-x0.15-0 on it. > > jfs's has been underperforming for me. Somehow the CPU usage is higher > than the other two. As for ext3fs and reiserfs, I can't detect any > significant difference. So if you're in a hurry, it'll be easier to > convert your ext2 to ext3 (using tune2fs) and use that. Otherwise, it'd > be nice if you could do your own testing, and post it to the list. > > -- > Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 > GNU/Linux >2:30pm up 204 days, 5:35, 5 users, load average: 5.50, 5.18, 5.13 > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] index / sequential scan problem
On Fri, 18 Jul 2003, Tom Lane wrote: > =?ISO-8859-1?Q?Dennis_Bj=F6rklund?= <[EMAIL PROTECTED]> writes: > > On Fri, 18 Jul 2003, Fabian Kreitner wrote: > >> Adjusting the cpu_tuple_cost to 0.042 got the planner to choose the index. > > > Doesn't sound very good and it will most likely make other queries slower. > > Seems like a reasonable approach to me --- certainly better than setting > random_page_cost to physically nonsensical values. > > In a fully-cached situation it's entirely reasonable to inflate the > various cpu_xxx costs, since by assumption you are not paying the normal > price of physical disk I/O. Fetching a page from kernel buffer cache > is certainly cheaper than getting it off the disk. But the CPU costs > involved in processing the page contents don't change. Since our cost > unit is defined as 1.0 = one sequential page fetch, you have to increase > the cpu_xxx numbers instead of reducing the I/O cost estimate. > > I would recommend inflating all the cpu_xxx costs by the same factor, > unless you have evidence that they are wrong in relation to each other. And don't forget to set effective_cache_size. It's the one I missed for the longest when I started. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Clearing rows periodically
Martin Foster <[EMAIL PROTECTED]> writes: > My question is, should the purging of rows be done more often then once > a day for both tables. Is this why performance seems to take a hit > specifically? Given that the hourly purge seems to work well for you, I'd suggest trying it on both tables. Non-FULL vacuum is intended to be run *frequently*, say as often as you've updated or deleted 10% to 50% of the rows in a table. Delaying it until you've had multiple complete turnovers of the table contents will cost you. > As there were too many rows purged for vacuum to > accurately keep track of? Only possible if you don't have the FSM parameters set high enough. Infrequent vacuuming means you need more FSM space, btw. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware performance
> > >Adam Witney wrote: > [snip] > > If you would go with that one, make sure to get the optional BBWC > > (Battery Backed Write Cache). Without it the controller > won't enable > > the write-back cache (which it really shouldn't, since it > wouldn't be > > safe without the batteries). WB cache can really speed things on in > > many db situations - it's sort of like "speed of fsync off, > security > > of fsync on". I've seen huge speedups with both postgresql > and other > > databases on that. > > Don't forget to check the batteries!!! And if you have an > HPaq service contract, don't rely on them to do it... That's what management software is for.. :-) (Yes, it does check the batteries. They are also reported on reboot, but you don't want to do that often, of course) Under the service contract, HP will *replace* the batteries for free, though - but you have to know when to replace them. //Magnus ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] index / sequential scan problem
=?ISO-8859-1?Q?Dennis_Bj=F6rklund?= <[EMAIL PROTECTED]> writes: > On Fri, 18 Jul 2003, Fabian Kreitner wrote: >> Adjusting the cpu_tuple_cost to 0.042 got the planner to choose the index. > Doesn't sound very good and it will most likely make other queries slower. Seems like a reasonable approach to me --- certainly better than setting random_page_cost to physically nonsensical values. In a fully-cached situation it's entirely reasonable to inflate the various cpu_xxx costs, since by assumption you are not paying the normal price of physical disk I/O. Fetching a page from kernel buffer cache is certainly cheaper than getting it off the disk. But the CPU costs involved in processing the page contents don't change. Since our cost unit is defined as 1.0 = one sequential page fetch, you have to increase the cpu_xxx numbers instead of reducing the I/O cost estimate. I would recommend inflating all the cpu_xxx costs by the same factor, unless you have evidence that they are wrong in relation to each other. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Yet another slow join query..
Hi All, data_bank.updated_profiles and public.city_master are small tables with 21790 and 49303 records repectively. both have indexes on the join column. in first one on (city,source) and in second one on (city) The query below does not return for long durations > 10 mins. explain analyze select b.state,a.city from data_bank.updated_profiles a join public.city_master b using(city) where source='BRANDING' and a.state is NULL and b.country='India' ; simple explain returns below. ~~ Nested Loop (cost=0.00..83506.31 rows=14 width=35) Join Filter: ("outer".city = ("inner".city)::text) -> Seq Scan on updated_profiles a (cost=0.00..1376.39 rows=89 width=11) Filter: ((source = 'BRANDING'::character varying) AND (state IS NULL)) -> Index Scan using city_master_temp1 on city_master b (cost=0.00..854.87 rows=5603 width=24) Filter: (country = 'India'::character varying) (6 rows) - Any help is appreciated. Regds mallah. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Clearing rows periodically
On Fri, Jul 18, 2003 at 12:55:12AM -0600, Martin Foster wrote: > The other table follows a sequential order and carries more columns of > information. However, this table clears it's entry nightly and with > current settings will delete roughly a days traffic sitting at 50K rows > of information. > has been skipped, which includes the use of VACUUM ANALYZE EXPLAIN. > This seems to be an indication that the process of a daily delete is > actually a very wise step to take, even if the information itself is not > needed for very long. > > A VACUUM FULL will correct the issue, but put the site out of commission > for roughly 20 minutes as the drive crunches the information. During your "clearing period", why not do the deletes in batches, and VACUUM the table periodically. That will allow you to reclaim the space gradually, and ensure that you don't end up with a big "bald spot". But you probably want to increase your FSM settings. See the docs. A -- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <[EMAIL PROTECTED]> M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Hardware performance
On Thu, 2003-07-17 at 13:55, Magnus Hagander wrote: > >Adam Witney wrote: [snip] > If you would go with that one, make sure to get the optional BBWC > (Battery Backed Write Cache). Without it the controller won't enable the > write-back cache (which it really shouldn't, since it wouldn't be safe > without the batteries). WB cache can really speed things on in many db > situations - it's sort of like "speed of fsync off, security of fsync > on". I've seen huge speedups with both postgresql and other databases on > that. Don't forget to check the batteries!!! And if you have an HPaq service contract, don't rely on them to do it... -- +-+ | Ron Johnson, Jr.Home: [EMAIL PROTECTED] | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!"| |unknown | +-+ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Hardware performance
On Wed, 2003-07-16 at 23:25, Roman Fail wrote: [snip] > has every bit of redundancy you can order. While uncommon, the > backplane is one one of the many single points of failure! Unless you go with a shared-disk cluster (Oracle 9iRAC or OpenVMS) or replication. Face it, if your pockets are deep enough, you can make everything redundant and burden-sharing (i.e., not just waiting for the master system to die). (And with some enterprise FC controllers, you can mirror the disks many kilometers away.) -- +-+ | Ron Johnson, Jr.Home: [EMAIL PROTECTED] | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!"| |unknown | +-+ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Clearing rows periodically
On 18/07/2003 07:55 Martin Foster wrote: [snip] A VACUUM FULL will correct the issue, but put the site out of commission for roughly 20 minutes as the drive crunches the information. My question is, should the purging of rows be done more often then once a day for both tables. Is this why performance seems to take a hit specifically? As there were too many rows purged for vacuum to accurately keep track of? ISTR that there are setting in postgresql.conf which affect how many tables/rows vacuum can reclaim. The docs say that the default setting of max_fsm_pages is 1. Maybe this should be increased for your situation? HTH -- Paul Thomas +--+-+ | Thomas Micro Systems Limited | Software Solutions for the Smaller Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +--+-+ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sanity check requested
Shridhar Daithankar wrote: On 17 Jul 2003 at 10:41, Nick Fankhauser wrote: I'm using ext2. For now, I'll leave this and the OS version alone. If I I appreciate your approach but it almost proven that ext2 is not the best and fastest out there. Agreed. IMO, you can safely change that to reiserfs or XFS. Or course, testing is always recommended. We've been using ext3fs for our production systems. (Red Hat Advanced Server 2.1) And since your (Nick) system is based on Debian, I have done some rough testing on Debian sarge (testing) (with custom 2.4.20) with ext3fs, reiserfs and jfs. Can't get XFS going easily on Debian, though. I used a single partition mkfs'd with ext3fs, reiserfs and jfs one after the other on an IDE disk. Ran pgbench and osdb-x0.15-0 on it. jfs's has been underperforming for me. Somehow the CPU usage is higher than the other two. As for ext3fs and reiserfs, I can't detect any significant difference. So if you're in a hurry, it'll be easier to convert your ext2 to ext3 (using tune2fs) and use that. Otherwise, it'd be nice if you could do your own testing, and post it to the list. -- Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux 2:30pm up 204 days, 5:35, 5 users, load average: 5.50, 5.18, 5.13 pgp0.pgp Description: PGP signature