File systems (RE: [PERFORM] Sanity check requested)
Thanks for the suggestions in the FS types- especially the Debian oriented info. I'll start by playing with the memory allocation parameters that I originally listed (seems like they should provide results in a way that is unaffected by the disk IO). Then once I have them at optimal values, move on to trying different file systems. I assume that as I make changes that affect the disk IO performance, I'll then need to do some testing to find new values for the IO cost for the planner- Do you folks have some ballpark numbers to start with for this based on your experience? I'm departing in three ways from the simple IDE model that (I presume) the default random page cost of 4 is based on- The disks are SCSI RAID and the FS would be different. At this point, I can't think of any better way to test this than simply running my local test suite with various values and recording the wall-clock results. Is there a different approach that might make more sense? (This means that my results will be skewed to my environment, but I'll post them anyway.) I'll post results back to the list as I get to it- It might be a slow process Since I spend about 18 hours of each day keeping the business running, I'll have to cut back on sleep do this in the other 10 hours. g -NF Shridhar Daithankar wrote: I appreciate your approach but it almost proven that ext2 is not the best and fastest out there. Agreed. Ang Chin Han wrote: We've been using ext3fs for our production systems. (Red Hat Advanced Server 2.1) And since your (Nick) system is based on Debian, I have done some rough testing on Debian sarge (testing) (with custom 2.4.20) with ext3fs, reiserfs and jfs. Can't get XFS going easily on Debian, though. I used a single partition mkfs'd with ext3fs, reiserfs and jfs one after the other on an IDE disk. Ran pgbench and osdb-x0.15-0 on it. jfs's has been underperforming for me. Somehow the CPU usage is higher than the other two. As for ext3fs and reiserfs, I can't detect any significant difference. So if you're in a hurry, it'll be easier to convert your ext2 to ext3 (using tune2fs) and use that. Otherwise, it'd be nice if you could do your own testing, and post it to the list. -- Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux 2:30pm up 204 days, 5:35, 5 users, load average: 5.50, 5.18, 5.13 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Sanity check requested
On Fri, 18 Jul 2003, Ang Chin Han wrote: Shridhar Daithankar wrote: On 17 Jul 2003 at 10:41, Nick Fankhauser wrote: I'm using ext2. For now, I'll leave this and the OS version alone. If I I appreciate your approach but it almost proven that ext2 is not the best and fastest out there. Agreed. Huh? How can journaled file systems hope to outrun a simple unjournaled file system? There's just less overhead for ext2 so it's quicker, it's just not as reliable. I point you to this link from IBM: http://www-124.ibm.com/developerworks/opensource/linuxperf/iozone/iozone.php While ext3 is a clear loser to jfs and rfs, ext2 wins most of the contests against both reiser and jfs. Note that xfs wasn't tested here. But in general, ext2 is quite fast nowadays. IMO, you can safely change that to reiserfs or XFS. Or course, testing is always recommended. We've been using ext3fs for our production systems. (Red Hat Advanced Server 2.1) And since your (Nick) system is based on Debian, I have done some rough testing on Debian sarge (testing) (with custom 2.4.20) with ext3fs, reiserfs and jfs. Can't get XFS going easily on Debian, though. I used a single partition mkfs'd with ext3fs, reiserfs and jfs one after the other on an IDE disk. Ran pgbench and osdb-x0.15-0 on it. jfs's has been underperforming for me. Somehow the CPU usage is higher than the other two. As for ext3fs and reiserfs, I can't detect any significant difference. So if you're in a hurry, it'll be easier to convert your ext2 to ext3 (using tune2fs) and use that. Otherwise, it'd be nice if you could do your own testing, and post it to the list. I would like to see some tests on how they behave on top of large fast RAID arrays, like a 10 disk RAID5 or something. It's likely that on a single IDE drive the most limiting factor is the bandwidth of the drive, whereas on a large array, the limiting factor would likely be the file system code. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Sanity check requested
On 2003-07-17 10:41:35 -0500, Nick Fankhauser wrote: I'm using ext2. For now, I'll leave this and the OS version alone. If I I'd upgrade to a journaling filesystem as soon as possible for reliability. Testing in our own environment has shown that PostgreSQL performs best on ext3 (yes, better than XFS, JFS or ReiserFS) with a linux 2.4.21 kernel. Be sure to mount noatime and to create the ext3 partition with the correct stripe size of your RAID array using the '-R stride=foo' option (see man mke2fs). Vincent van Leeuwen Media Design - http://www.mediadesign.nl/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sanity check requested
Be sure to mount noatime I did chattr -R +A /var/lib/pgsql/data that should do the trick as well or am I wrong? regards, Oli ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sanity check requested
On 2003-07-18 18:20:55 +0200, Oliver Scheit wrote: Be sure to mount noatime I did chattr -R +A /var/lib/pgsql/data that should do the trick as well or am I wrong? According to the man page it gives the same effect. There are a few things you should consider though: - new files won't be created with the same options (I think), so you'll have to run this command as a daily cronjob or something to that effect - chattr is probably more filesystem-specific than a noatime mount, although this isn't a problem on ext[23] ofcourse Vincent van Leeuwen Media Design - http://www.mediadesign.nl/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Sanity check requested
Be sure to mount noatime I did chattr -R +A /var/lib/pgsql/data that should do the trick as well or am I wrong? According to the man page it gives the same effect. There are a few things you should consider though: - new files won't be created with the same options (I think), so you'll have to run this command as a daily cronjob or something to that effect This would be a really interesting point to know. I will look into this. I think the advantage of chattr is that the last access time is still available for the rest of the filesystem. (Of course you could have your own filesystem just for the database stuff, in this case the advantage would be obsolete) regards, Oli ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sanity check requested
I'm confused: Ang Chin Han wrote: We've been using ext3fs for our production systems. (Red Hat Advanced Server 2.1) Vincent van Leeuwen wrote: I'd upgrade to a journaling filesystem as soon as possible for reliability. ...About one year ago I considered moving to a journaling file system, but opted not to because it seems like that's what WAL does for us already. How does putting a journaling file system under it add more reliability? I also guessed that a journaling file system would add overhead because now a write to the WAL file could itself be deferred and logged elsewhere. ...So now I'm really puzzled because folks are weighing in with solid anecdotal evidence saying that I'll get both better reliability and performance. Can someone explain what I'm missing about the concept? -A puzzled Nick ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sanity check requested
Nick, ...About one year ago I considered moving to a journaling file system, but opted not to because it seems like that's what WAL does for us already. How does putting a journaling file system under it add more reliability? It lets you restart your server quickly after an unexpected power-out. Ext2 is notoriously bad about this. Also, WAL cannot necessarily recover properly if the underlying filesystem is corrupted. I also guessed that a journaling file system would add overhead because now a write to the WAL file could itself be deferred and logged elsewhere. You are correct. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Sanity check requested
...About one year ago I considered moving to a journaling file system, but opted not to because it seems like that's what WAL does for us already. How does putting a journaling file system under it add more reliability? WAL only works if the WAL files are actually written to disk and can be read off it again. Ext2 has a number of deficiencies which can cause problems with this basic operation (inode corruptions, etc). Journaling does not directly help. signature.asc Description: This is a digitally signed message part
Re: File systems (RE: [PERFORM] Sanity check requested)
Nick Fankhauser [EMAIL PROTECTED] writes: I'm departing in three ways from the simple IDE model that (I presume) the default random page cost of 4 is based on- The disks are SCSI RAID and the FS would be different. Actually, the default 4 is based on experiments I did quite awhile back on HPUX (with a SCSI disk) and Linux (with an IDE disk, and a different filesystem). I didn't see too much difference between 'em. RAID might alter the equation, or not. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sanity check requested
Shridhar- I appreciate your thoughts- I'll be running some before after tests on this using one of our development/hot-swap boxes, so I'll report the results back to the list. A few more thoughts/questions: 1. 30 users does not seem to be much of a oevrhead. If possible try doing away with connection pooling. The application needs to scale up gracefully. We actually have about 200 users that could decide to log on at the same time- 30 is just a typical load. We'd also prefer to have 20,000 subscribers so we can start making a living with this business g. 2. While increasing sort memory, try 4/8/16 in that order. That way you will get a better picture of load behaviour. Though whatever you put appears reasonable, having more data always help. I'll try that approach while testing. Is it the case that the sort memory is allocated for each connection and becomes unavailable to other processes while the connection exists? If so, since I'm using a connection pool, I should be able to control total usage precisely. Without a connection pool, I could start starving the rest of the system for resources if the number of users spiked unexpectedly. Correct? 3. I don't know how this affects on SCSI drives, but what file system you are using? Can you try diferent ones? 4. OK, this is too much but linux kernel 2.6 is in test and has vastly improved IO... I'm using ext2. For now, I'll leave this and the OS version alone. If I change too many variables, I won't be able to discern which one is causing a change. Although I understand that there's an element of art to tuning, I'm enough of a neophyte that I don't have a feeling for the tuning parameters yet and hence I have to take a scientific approach of just tweaking a few variables in an otherwise controlled and unchanged environment. If I can't reach my goals with the simple approach, I'll consider some of the more radical ideas. Again, thanks for the ideas- I'll feed the results back after I've done some tests -Nick ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Sanity check requested
Nick, I'll try that approach while testing. Is it the case that the sort memory is allocated for each connection and becomes unavailable to other processes while the connection exists? If so, since I'm using a connection pool, I should be able to control total usage precisely. Without a connection pool, I could start starving the rest of the system for resources if the number of users spiked unexpectedly. Correct? Wrong, actually. Sort memory is allocated *per sort*, not per connnection or per query. So a single complex query could easily use 4xsort_mem if it has several merge joins ... and a pooled connection could use many times sort_mem depending on activity. Thus connection pooling does not help you with sort_mem usage at all, unless your pooling mechanism can control the rate at which queries are fed to the planner. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sanity check requested
Wrong, actually. Sort memory is allocated *per sort*, not per connnection or per query. So a single complex query could easily use 4xsort_mem if it has several merge joins ... Thanks for the correction- it sounds like this is one where usage can't be precisely controlled in a dynamic user environment I just need to get a feel for what works under a load that approximates my production system. -Nick ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Sanity check requested
Nick Fankhauser wrote: Thanks for the correction- it sounds like this is one where usage can't be precisely controlled in a dynamic user environment I just need to get a feel for what works under a load that approximates my production system. I think the most important point here is that if you set sort_mem too high, and you have a lot of simultaneous sorts, you can drive the server into swapping, which obviously is a very bad thing. You want it set as high as possible, but not so high given your usage patterns that you wind up swapping. Joe ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html