Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: > On Sun, 2005-08-21 at 16:13 -0400, Ron wrote: > >>At 10:54 AM 8/21/2005, Jeremiah Jahn wrote: >> ... >>So you have 2 controllers each with 2 external slots? But you are >>currently only using 1 controller and only one external slot on that >>controller? > > > Sorry, no. I have one dual channel card in the system and two > controllers on the array. Dell PowerVault 220S w/ PERC4eDC-PCI Express > > ... >>Regardless of the above, each of these controllers should still be >>good for about 80-85% of 640MB/s, or ~510-540 MB/s apiece when doing >>raw sequential IO if you plug 3-4 fast enough HD's into each SCSI >>channel. Cheetah 15K.4's certainly are fast enough. Optimal setup >>is probably to split each RAID 1 pair so that one HD is on each of >>the SCSI channels, and then RAID 0 those pairs. That will also >>protect you from losing the entire disk subsystem if one of the SCSI >>channels dies. > > I like this idea, but how exactly does one bond the two channels > together? Won't this cause me to have both an /dev/sdb and an /dev/sdc? > Well, even if you did, you could always either use software raid, or lvm to turn it into a single volume. It also depends what the controller card bios would let you get away with. Some cards would let you setup 4 RAID1's (one drive from each channel), and then create a RAID0 of those pairs. Software raid should do this without any problem. And can even be done such that it can be grown in the future, as well as work across multiple cards (though the latter is supported by some cards as well). > > >>That 128MB of buffer cache may very well be too small to keep the IO >>rate up, and/or there may be a more subtle problem with the LSI card, >>and/or you may have a configuration problem, but _something(s)_ need >>fixing since you are only getting raw sequential IO of ~100-150MB/s >>when it should be above 500MB/s. > > > It looks like there's a way to add more memory to it. This memory probably helps more in writing than reading. If you are reading the same area over and over, it might end up being a little bit of extra cache for that (but it should already be cached in system RAM, so you don't really get anything). ... >>Initial reads are only going to be as fast as your HD subsystem, so >>there's a reason for making the HD subsystem faster even if all you >>care about is reads. In addition, I'll repeat my previous advice >>that upgrading to 16GB of RAM would be well worth it for you. > > > 12GB is my max. I may run with it for a while and see. If your working set truly is 10GB, then you can get a massive performance increase even at 12GB. If your working set is 10GB and you have 6GB of RAM, it probably is always swapping out what it just read for the new stuff, even though you will read that same thing again in a few seconds. So rather than just paying for the 4GB that can't be cached, you pay for the whole 10. John =:-> > > >>Hope this helps, >>Ron Peacetree >> >> >> >>---(end of broadcast)--- >>TIP 9: In versions below 8.0, the planner will ignore your desire to >> choose an index scan if your joining column's datatypes do not >> match signature.asc Description: OpenPGP digital signature
Re: [PERFORM] extremly low memory usage
Ron wrote: PERC4eDC-PCI Express, 128MB Cache, 2-External Channels Looks like they are using the LSI Logic MegaRAID SCSI 320-2E controller. IIUC, you have 2 of these, each with 2 external channels? A lot of people have mentioned Dell's versions of the LSI cards can be WAY slower than the ones you buy from LSI. Why this is the case? Nobody knows for sure. Here's a guess on my part. A while back, I was doing some googling. And instead of typing "LSI MegaRAID xxx", I just typed "MegaRAID xxx". Going beyond the initial pages, I saw Tekram -- a company that supposedly produces their own controllers -- listing products with the exact model numbers and photos as cards from LSI and Areca. Seemed puzzling until I read a review about SATA RAID cards where it mentioned Tekram produces the Areca cards under their own name but using slower components to avoid competing at the highend with them. So what may be happening is that the logic circuitry on the Dell PERCs are the same as the source LSI cards, the speed of the RAID processor/RAM/internal buffers/etc is not as fast so Dell can shave off a few bucks for every server. That would mean while a true LSI card has the processing power to do the RAID calculates for X drives, the Dell version probably can only do X*0.6 drives or so. The 128MB buffer also looks suspiciously small, and I do not see any upgrade path for it on LSI Logic's site. "Serious" RAID controllers from companies like Xyratex, Engino, and Dot-hill can have up to 1-2GB The card is upgradable. If you look at the pic of the card, it shows a SDRAM DIMM versus integrated RAM chips. I've also read reviews a while back comparing benchmarks of the 320-2 w/ 128K versus 512K onboard RAM. Their product literature is just nebulous on the RAM upgrade part. I'm sure if you opened up the PDF manuals, you could find the exact info That 128MB of buffer cache may very well be too small to keep the IO rate up, and/or there may be a more subtle problem with the LSI card, and/or you may have a configuration problem, but _something(s)_ need fixing since you are only getting raw sequential IO of ~100-150MB/s when it should be above 500MB/s. I think it just might be the Dell hardware or the lack of 64-bit IOMMU on Xeon's. Here's my numbers on 320-1 w/ 128K paired up with Opterons compared to Jeremiah's. >> # time dd if=/dev/zero of=testfile bs=1024 count=100 >> 100+0 records in >> 100+0 records out >> >> real0m8.885s >> user0m0.299s >> sys 0m6.998s 2x15K RAID1 real0m14.493s user0m0.255s sys 0m11.712s 6x15K RAID10 (2x 320-1) real0m9.986s user0m0.200s sys 0m8.634s >> # time dd of=/dev/null if=testfile bs=1024 count=100 >> 100+0 records in >> 100+0 records out >> >> real0m1.654s >> user0m0.232s >> sys 0m1.415s 2x15K RAID1 real0m3.383s user0m0.176s sys 0m3.207s 6x15K RAID10 (2x 320-1) real0m2.427s user0m0.178s sys 0m2.250s If all 14 HDs are arranged in a RAID10 array, I'd say there's definitely something wrong with Jeremiah's hardware. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] extremly low memory usage
On Sun, 2005-08-21 at 16:13 -0400, Ron wrote: > At 10:54 AM 8/21/2005, Jeremiah Jahn wrote: > >On Sat, 2005-08-20 at 21:32 -0500, John A Meinel wrote: > > > Ron wrote: > > > > > > Well, since you can get a read of the RAID at 150MB/s, that means that > > > it is actual I/O speed. It may not be cached in RAM. Perhaps you could > > > try the same test, only using say 1G, which should be cached. > > > >[EMAIL PROTECTED] pgsql]# time dd if=/dev/zero of=testfile bs=1024 > >count=100 > >100+0 records in > >100+0 records out > > > >real0m8.885s > >user0m0.299s > >sys 0m6.998s > > This is abysmally slow. > > > >[EMAIL PROTECTED] pgsql]# time dd of=/dev/null if=testfile bs=1024 > >count=100 > >100+0 records in > >100+0 records out > > > >real0m1.654s > >user0m0.232s > >sys 0m1.415s > > This transfer rate is the only one out of the 4 you have posted that > is in the vicinity of where it should be. > > > >The raid array I have is currently set up to use a single channel. But I > >have dual controllers in the array. And dual external slots on the card. > >The machine is brand new and has pci-e backplane. > > > So you have 2 controllers each with 2 external slots? But you are > currently only using 1 controller and only one external slot on that > controller? Sorry, no. I have one dual channel card in the system and two controllers on the array. Dell PowerVault 220S w/ PERC4eDC-PCI Express > > > > > > Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them > > > > doing raw sequential IO like this should be capable of at > > > > ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's > >BTW I'm using Seagate Cheetah 15K.4's > > OK, now we have that nailed down. > > > > > > AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic > > > > MegaRAID controllers. What I don't know is which exact one yours is, > > > > nor do I know if it (or any of the MegaRAID controllers) are high > > > > powered enough. > > > >PERC4eDC-PCI Express, 128MB Cache, 2-External Channels > > Looks like they are using the LSI Logic MegaRAID SCSI 320-2E > controller. IIUC, you have 2 of these, each with 2 external channels? > > The specs on these appear a bit strange. They are listed as being a > PCI-Ex8 card, which means they should have a max bandwidth of 20Gb/s= > 2GB/s, yet they are also listed as only supporting dual channel U320= > 640MB/s when they could easily support quad channel U320= > 1.28GB/s. Why bother building a PCI-Ex8 card when only a PCI-Ex4 > card (which is a more standard physical format) would've been > enough? Or if you are going to build a PCI-Ex8 card, why not support > quad channel U320? This smells like there's a problem with LSI's design. > > The 128MB buffer also looks suspiciously small, and I do not see any > upgrade path for it on LSI Logic's site. "Serious" RAID controllers > from companies like Xyratex, Engino, and Dot-hill can have up to > 1-2GB of buffer, and there's sound technical reasons for it. See if > there's a buffer upgrade available or if you can get controllers that > have larger buffer capabilities. > > Regardless of the above, each of these controllers should still be > good for about 80-85% of 640MB/s, or ~510-540 MB/s apiece when doing > raw sequential IO if you plug 3-4 fast enough HD's into each SCSI > channel. Cheetah 15K.4's certainly are fast enough. Optimal setup > is probably to split each RAID 1 pair so that one HD is on each of > the SCSI channels, and then RAID 0 those pairs. That will also > protect you from losing the entire disk subsystem if one of the SCSI > channels dies. I like this idea, but how exactly does one bond the two channels together? Won't this cause me to have both an /dev/sdb and an /dev/sdc? > > That 128MB of buffer cache may very well be too small to keep the IO > rate up, and/or there may be a more subtle problem with the LSI card, > and/or you may have a configuration problem, but _something(s)_ need > fixing since you are only getting raw sequential IO of ~100-150MB/s > when it should be above 500MB/s. It looks like there's a way to add more memory to it. > > This will make the most difference for initial reads (first time you > load a table, first time you make a given query, etc) and for any writes. > > Your HW provider should be able to help you, even if some of the HW > in question needs to be changed. You paid for a solution. As long > as this stuff is performing at so much less then what it is supposed > to, you have not received the solution you paid for. > > BTW, on the subject of RAID stripes IME the sweet spot tends to be in > the 64KB to 256KB range (very large, very read heavy data mines can > want larger RAID stripes.). Only experimentation will tell you what > results in the best performance for your application. I think I have them very small at the moment. > > > >I'm not really worried about the wr
Re: [PERFORM] extremly low memory usage
At 10:54 AM 8/21/2005, Jeremiah Jahn wrote: On Sat, 2005-08-20 at 21:32 -0500, John A Meinel wrote: > Ron wrote: > > Well, since you can get a read of the RAID at 150MB/s, that means that > it is actual I/O speed. It may not be cached in RAM. Perhaps you could > try the same test, only using say 1G, which should be cached. [EMAIL PROTECTED] pgsql]# time dd if=/dev/zero of=testfile bs=1024 count=100 100+0 records in 100+0 records out real0m8.885s user0m0.299s sys 0m6.998s This is abysmally slow. [EMAIL PROTECTED] pgsql]# time dd of=/dev/null if=testfile bs=1024 count=100 100+0 records in 100+0 records out real0m1.654s user0m0.232s sys 0m1.415s This transfer rate is the only one out of the 4 you have posted that is in the vicinity of where it should be. The raid array I have is currently set up to use a single channel. But I have dual controllers in the array. And dual external slots on the card. The machine is brand new and has pci-e backplane. So you have 2 controllers each with 2 external slots? But you are currently only using 1 controller and only one external slot on that controller? > > Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them > > doing raw sequential IO like this should be capable of at > > ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's BTW I'm using Seagate Cheetah 15K.4's OK, now we have that nailed down. > > AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic > > MegaRAID controllers. What I don't know is which exact one yours is, > > nor do I know if it (or any of the MegaRAID controllers) are high > > powered enough. PERC4eDC-PCI Express, 128MB Cache, 2-External Channels Looks like they are using the LSI Logic MegaRAID SCSI 320-2E controller. IIUC, you have 2 of these, each with 2 external channels? The specs on these appear a bit strange. They are listed as being a PCI-Ex8 card, which means they should have a max bandwidth of 20Gb/s= 2GB/s, yet they are also listed as only supporting dual channel U320= 640MB/s when they could easily support quad channel U320= 1.28GB/s. Why bother building a PCI-Ex8 card when only a PCI-Ex4 card (which is a more standard physical format) would've been enough? Or if you are going to build a PCI-Ex8 card, why not support quad channel U320? This smells like there's a problem with LSI's design. The 128MB buffer also looks suspiciously small, and I do not see any upgrade path for it on LSI Logic's site. "Serious" RAID controllers from companies like Xyratex, Engino, and Dot-hill can have up to 1-2GB of buffer, and there's sound technical reasons for it. See if there's a buffer upgrade available or if you can get controllers that have larger buffer capabilities. Regardless of the above, each of these controllers should still be good for about 80-85% of 640MB/s, or ~510-540 MB/s apiece when doing raw sequential IO if you plug 3-4 fast enough HD's into each SCSI channel. Cheetah 15K.4's certainly are fast enough. Optimal setup is probably to split each RAID 1 pair so that one HD is on each of the SCSI channels, and then RAID 0 those pairs. That will also protect you from losing the entire disk subsystem if one of the SCSI channels dies. That 128MB of buffer cache may very well be too small to keep the IO rate up, and/or there may be a more subtle problem with the LSI card, and/or you may have a configuration problem, but _something(s)_ need fixing since you are only getting raw sequential IO of ~100-150MB/s when it should be above 500MB/s. This will make the most difference for initial reads (first time you load a table, first time you make a given query, etc) and for any writes. Your HW provider should be able to help you, even if some of the HW in question needs to be changed. You paid for a solution. As long as this stuff is performing at so much less then what it is supposed to, you have not received the solution you paid for. BTW, on the subject of RAID stripes IME the sweet spot tends to be in the 64KB to 256KB range (very large, very read heavy data mines can want larger RAID stripes.). Only experimentation will tell you what results in the best performance for your application. I'm not really worried about the writing, it's the reading the reading that needs to be faster. Initial reads are only going to be as fast as your HD subsystem, so there's a reason for making the HD subsystem faster even if all you care about is reads. In addition, I'll repeat my previous advice that upgrading to 16GB of RAM would be well worth it for you. Hope this helps, Ron Peacetree ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] extremly low memory usage
I'm resending this as it appears not to have made it to the list. At 10:54 AM 8/21/2005, Jeremiah Jahn wrote: On Sat, 2005-08-20 at 21:32 -0500, John A Meinel wrote: > Ron wrote: > > Well, since you can get a read of the RAID at 150MB/s, that means that > it is actual I/O speed. It may not be cached in RAM. Perhaps you could > try the same test, only using say 1G, which should be cached. [EMAIL PROTECTED] pgsql]# time dd if=/dev/zero of=testfile bs=1024 count=100 100+0 records in 100+0 records out real0m8.885s user0m0.299s sys 0m6.998s This is abysmally slow. [EMAIL PROTECTED] pgsql]# time dd of=/dev/null if=testfile bs=1024 count=100 100+0 records in 100+0 records out real0m1.654s user0m0.232s sys 0m1.415s This transfer rate is the only one out of the 4 you have posted that is in the vicinity of where it should be. The raid array I have is currently set up to use a single channel. But I have dual controllers in the array. And dual external slots on the card. The machine is brand new and has pci-e backplane. So you have 2 controllers each with 2 external slots? But you are currently only using 1 controller and only one external slot on that controller? > > Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them > > doing raw sequential IO like this should be capable of at > > ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's BTW I'm using Seagate Cheetah 15K.4's OK, now we have that nailed down. > > AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic > > MegaRAID controllers. What I don't know is which exact one yours is, > > nor do I know if it (or any of the MegaRAID controllers) are high > > powered enough. PERC4eDC-PCI Express, 128MB Cache, 2-External Channels Looks like they are using the LSI Logic MegaRAID SCSI 320-2E controller. IIUC, you have 2 of these, each with 2 external channels? The specs on these appear a bit strange. They are listed as being a PCI-Ex8 card, which means they should have a max bandwidth of 20Gb/s= 2GB/s, yet they are also listed as only supporting dual channel U320= 640MB/s when they could easily support quad channel U320= 1.28GB/s. Why bother building a PCI-Ex8 card when only a PCI-Ex4 card (which is a more standard physical format) would've been enough? Or if you are going to build a PCI-Ex8 card, why not support quad channel U320? This smells like there's a problem with LSI's design. The 128MB buffer also looks suspiciously small, and I do not see any upgrade path for it on LSI Logic's site. "Serious" RAID controllers from companies like Xyratex, Engino, and Dot-hill can have up to 1-2GB of buffer, and there's sound technical reasons for it. See if there's a buffer upgrade available or if you can get controllers that have larger buffer capabilities. Regardless of the above, each of these controllers should still be good for about 80-85% of 640MB/s, or ~510-540 MB/s apiece when doing raw sequential IO if you plug 3-4 fast enough HD's into each SCSI channel. Cheetah 15K.4's certainly are fast enough. Optimal setup is probably to split each RAID 1 pair so that one HD is on each of the SCSI channels, and then RAID 0 those pairs. That will also protect you from losing the entire disk subsystem if one of the SCSI channels dies. That 128MB of buffer cache may very well be too small to keep the IO rate up, and/or there may be a more subtle problem with the LSI card, and/or you may have a configuration problem, but _something(s)_ need fixing since you are only getting raw sequential IO of ~100-150MB/s when it should be above 500MB/s. This will make the most difference for initial reads (first time you load a table, first time you make a given query, etc) and for any writes. Your HW provider should be able to help you, even if some of the HW in question needs to be changed. You paid for a solution. As long as this stuff is performing at so much less then what it is supposed to, you have not received the solution you paid for. BTW, on the subject of RAID stripes IME the sweet spot tends to be in the 64KB to 256KB range (very large, very read heavy data mines can want larger RAID stripes.). Only experimentation will tell you what results in the best performance for your application. I'm not really worried about the writing, it's the reading the reading that needs to be faster. Initial reads are only going to be as fast as your HD subsystem, so there's a reason for making the HD subsystem faster even if all you care about is reads. In addition, I'll repeat my previous advice that upgrading to 16GB of RAM would be well worth it for you. Hope this helps, Ron Peacetree ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: On Sat, 2005-08-20 at 21:32 -0500, John A Meinel wrote: Ron wrote: At 02:53 PM 8/20/2005, Jeremiah Jahn wrote: Well, since you can get a read of the RAID at 150MB/s, that means that it is actual I/O speed. It may not be cached in RAM. Perhaps you could try the same test, only using say 1G, which should be cached. [EMAIL PROTECTED] pgsql]# time dd if=/dev/zero of=testfile bs=1024 count=100 100+0 records in 100+0 records out real0m8.885s user0m0.299s sys 0m6.998s [EMAIL PROTECTED] pgsql]# time dd of=/dev/null if=testfile bs=1024 count=100 100+0 records in 100+0 records out real0m1.654s user0m0.232s sys 0m1.415s The write time seems about the same (but you only have 128MB of write cache), but your read jumped up to 620MB/s. So you drives do seem to be giving you 150MB/s. ... I'm actually curious about PCI bus saturation at this point. Old 32-bit 33MHz pci could only push 1Gbit = 100MB/s. Now, I'm guessing that this is a higher performance system. But I'm really surprised that your write speed is that close to your read speed. (100MB/s write, 150MB/s read). The raid array I have is currently set up to use a single channel. But I have dual controllers In the array. And dual external slots on the card. The machine is brand new and has pci-e backplane. Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them doing raw sequential IO like this should be capable of at ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's, ~7*79MB/s= 553MB/s BTW I'm using Seagate Cheetah 15K.4's Now, are the numbers that Ron is quoting in megabytes or megabits? I'm guessing he knows what he is talking about, and is doing megabytes. 80MB/s sustained seems rather high for a hard-disk. Though this page: http://www.storagereview.com/articles/200411/20041116ST3146754LW_2.html Does seem to agree with that statement. (Between 56 and 93MB/s) And since U320 is a 320MB/s bus, it doesn't seem like anything there should be saturating. So why the low performance _IF_ the controller setup is high powered enough to keep that kind of IO rate up. This will require a controller or controllers providing dual channel U320 bandwidth externally and quad channel U320 bandwidth internally. IOW, it needs a controller or controllers talking 64b 133MHz PCI-X, reasonably fast DSP/CPU units, and probably a decent sized IO buffer as well. AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic MegaRAID controllers. What I don't know is which exact one yours is, nor do I know if it (or any of the MegaRAID controllers) are high powered enough. PERC4eDC-PCI Express, 128MB Cache, 2-External Channels Do you know which card it is? Does it look like this one: http://www.lsilogic.com/products/megaraid/megaraid_320_2e.html Judging by the 320 speed, and 2 external controllers, that is my guess. They at least claim a theoretical max of 2GB/s. Which makes you wonder why reading from RAM is only able to get throughput of 600MB/s. Did you run it multiple times? On my windows system, I get just under 550MB/s for what should be cached, copying from /dev/zero to /dev/null I get 2.4GB/s (though that might be a no-op). On a similar linux machine, I'm able to get 1200MB/s for a cached file. (And 3GB/s for a zero=>null copy). John =:-> Talk to your HW supplier to make sure you have controllers adequate to your HD's. ...and yes, your average access time will be in the 5.5ms - 6ms range when doing a physical seek. Even with RAID, you want to minimize seeks and maximize sequential IO when accessing them. Best to not go to HD at all ;-) Well, certainly, if you can get more into RAM, you're always better off. For writing, a battery-backed write cache, and for reading lots of system RAM. I'm not really worried about the writing, it's the reading the reading that needs to be faster. Hope this helps, Ron Peacetree John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] extremly low memory usage
On Sat, 2005-08-20 at 21:32 -0500, John A Meinel wrote: > Ron wrote: > > At 02:53 PM 8/20/2005, Jeremiah Jahn wrote: > > > >> On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote: > >> > Jeremiah Jahn wrote: > >> > > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote: > >> > > > >> > >> > > >> > > it's cached alright. I'm getting a read rate of about 150MB/sec. I > >> would > >> > > have thought is would be faster with my raid setup. I think I'm > >> going to > >> > > scrap the whole thing and get rid of LVM. I'll just do a straight > >> ext3 > >> > > system. Maybe that will help. Still trying to get suggestions for a > >> > > stripe size. > >> > > > > Well, since you can get a read of the RAID at 150MB/s, that means that > it is actual I/O speed. It may not be cached in RAM. Perhaps you could > try the same test, only using say 1G, which should be cached. [EMAIL PROTECTED] pgsql]# time dd if=/dev/zero of=testfile bs=1024 count=100 100+0 records in 100+0 records out real0m8.885s user0m0.299s sys 0m6.998s [EMAIL PROTECTED] pgsql]# time dd of=/dev/null if=testfile bs=1024 count=100 100+0 records in 100+0 records out real0m1.654s user0m0.232s sys 0m1.415s > > >> > > >> > I don't think 150MB/s is out of the realm for a 14 drive array. > >> > How fast is time dd if=/dev/zero of=testfile bs=8192 count=100 > >> > > >> time dd if=/dev/zero of=testfile bs=8192 count=100 > >> 100+0 records in > >> 100+0 records out > >> > >> real1m24.248s > >> user0m0.381s > >> sys 0m33.028s > >> > >> > >> > (That should create a 8GB file, which is too big to cache everything) > >> > And then how fast is: > >> > time dd if=testfile of=/dev/null bs=8192 count=100 > >> > >> time dd if=testfile of=/dev/null bs=8192 count=100 > >> 100+0 records in > >> 100+0 records out > >> > >> real0m54.139s > >> user0m0.326s > >> sys 0m8.916s > >> > >> > >> and on a second run: > >> > >> real0m55.667s > >> user0m0.341s > >> sys 0m9.013s > >> > >> > >> > > >> > That should give you a semi-decent way of measuring how fast the RAID > >> > system is, since it should be too big to cache in ram. > >> > >> about 150MB/Sec. Is there no better way to make this go faster...? > > I'm actually curious about PCI bus saturation at this point. Old 32-bit > 33MHz pci could only push 1Gbit = 100MB/s. Now, I'm guessing that this > is a higher performance system. But I'm really surprised that your write > speed is that close to your read speed. (100MB/s write, 150MB/s read). The raid array I have is currently set up to use a single channel. But I have dual controllers In the array. And dual external slots on the card. The machine is brand new and has pci-e backplane. > > > > > Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them > > doing raw sequential IO like this should be capable of at > > ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's, ~7*79MB/s= 553MB/s BTW I'm using Seagate Cheetah 15K.4's > > if using Fujitsu MAU's, and ~7*86MB/s= 602MB/s if using Maxtor Atlas 15K > > II's to devices external to the RAID array. > > I know I thought these were SATA drives, over 2 controllers. I could be > completely wrong, though. > > > > > _IF_ the controller setup is high powered enough to keep that kind of IO > > rate up. This will require a controller or controllers providing dual > > channel U320 bandwidth externally and quad channel U320 bandwidth > > internally. IOW, it needs a controller or controllers talking 64b > > 133MHz PCI-X, reasonably fast DSP/CPU units, and probably a decent sized > > IO buffer as well. > > > > AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic > > MegaRAID controllers. What I don't know is which exact one yours is, > > nor do I know if it (or any of the MegaRAID controllers) are high > > powered enough. PERC4eDC-PCI Express, 128MB Cache, 2-External Channels > > > > Talk to your HW supplier to make sure you have controllers adequate to > > your HD's. > > > > ...and yes, your average access time will be in the 5.5ms - 6ms range > > when doing a physical seek. > > Even with RAID, you want to minimize seeks and maximize sequential IO > > when accessing them. > > Best to not go to HD at all ;-) > > Well, certainly, if you can get more into RAM, you're always better off. > For writing, a battery-backed write cache, and for reading lots of > system RAM. I'm not really worried about the writing, it's the reading the reading that needs to be faster. > > > > > Hope this helps, > > Ron Peacetree > > > > John > =:-> -- Speak softly and carry a +6 two-handed sword. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] extremly low memory usage
I'm Sorry, that I wrote that the option would risk the LOG persistency with PostgreSQL. I should have asked instead, that how you have taken this into account. Tom Lane's email below convinces me, that you have taken the metadata only journalling into account and still fulfill the persistency of committed transactions. This means, that Ext3 with data=writeback is safe with PostgreSQL even with a hardware reset button. Metadata only journalling is faster, when it can be used. I didn't know, that any database can keep the database guarantees with the metadata only journalling option. I looked at your problem. One of the problems is that you need to keep the certain data cached in memory all the time. That could be solved by doing SELECT COUNT(*) from to_be_cached; as a cron job. It loads the whole table into the Linux Kernel memory cache. Marko Ristola Tom Lane wrote: >Right. I think the optimal setting for a Postgres data directory is >journaled metadata, non-journaled file content. Postgres can take care >of the data integrity for itself, but it does assume that the filesystem >stays structurally sane (eg, data blocks don't get reassigned to the >wrong file), so you need a filesystem guarantee about the metadata. > >WAL files are handled in a much more conservative way (created, filled >with zeroes, and fsync'd before we ever put any valuable data in 'em). >If you have WAL on its own drive then I think Mike's recommendation of >no filesystem journalling at all for that drive is probably OK. Or >you can do same as above (journal metadata only) if you want a little >extra protection. > >And of course all this reasoning depends on the assumption that the >drive tells the truth about write-completion. If the drive does write >caching it had better be able to complete all its accepted writes before >dying in a power failure. (Hence, battery-backed write cache is OK, any >other kind is evil.) > > regards, tom lane > >---(end of broadcast)--- >TIP 2: Don't 'kill -9' the postmaster > > ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] extremly low memory usage
Ron wrote: Oops. There's a misconception here. ... OTOH, access time is _latency_, and that is not changed. Access time for a RAID set is equal to that of the slowest access time, AKA highest latency, HD in the RAID set. You're overgeneralizing from one specific type of raid, aren't you? The comp.arch.storage FAQ explicitly states that "raid level 1... shorter latency... since resolution can be from any of multiple disks". Other raid configurations increase latency (raid 3 increases latency from an average of 1/2 revolution to near 1 revolution); but noone really uses those. Ron Mayer ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] extremly low memory usage
Ron wrote: At 02:53 PM 8/20/2005, Jeremiah Jahn wrote: On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote: > Jeremiah Jahn wrote: > > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote: > > > > > it's cached alright. I'm getting a read rate of about 150MB/sec. I would > > have thought is would be faster with my raid setup. I think I'm going to > > scrap the whole thing and get rid of LVM. I'll just do a straight ext3 > > system. Maybe that will help. Still trying to get suggestions for a > > stripe size. > > Well, since you can get a read of the RAID at 150MB/s, that means that it is actual I/O speed. It may not be cached in RAM. Perhaps you could try the same test, only using say 1G, which should be cached. > > I don't think 150MB/s is out of the realm for a 14 drive array. > How fast is time dd if=/dev/zero of=testfile bs=8192 count=100 > time dd if=/dev/zero of=testfile bs=8192 count=100 100+0 records in 100+0 records out real1m24.248s user0m0.381s sys 0m33.028s > (That should create a 8GB file, which is too big to cache everything) > And then how fast is: > time dd if=testfile of=/dev/null bs=8192 count=100 time dd if=testfile of=/dev/null bs=8192 count=100 100+0 records in 100+0 records out real0m54.139s user0m0.326s sys 0m8.916s and on a second run: real0m55.667s user0m0.341s sys 0m9.013s > > That should give you a semi-decent way of measuring how fast the RAID > system is, since it should be too big to cache in ram. about 150MB/Sec. Is there no better way to make this go faster...? I'm actually curious about PCI bus saturation at this point. Old 32-bit 33MHz pci could only push 1Gbit = 100MB/s. Now, I'm guessing that this is a higher performance system. But I'm really surprised that your write speed is that close to your read speed. (100MB/s write, 150MB/s read). Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them doing raw sequential IO like this should be capable of at ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's, ~7*79MB/s= 553MB/s if using Fujitsu MAU's, and ~7*86MB/s= 602MB/s if using Maxtor Atlas 15K II's to devices external to the RAID array. I know I thought these were SATA drives, over 2 controllers. I could be completely wrong, though. _IF_ the controller setup is high powered enough to keep that kind of IO rate up. This will require a controller or controllers providing dual channel U320 bandwidth externally and quad channel U320 bandwidth internally. IOW, it needs a controller or controllers talking 64b 133MHz PCI-X, reasonably fast DSP/CPU units, and probably a decent sized IO buffer as well. AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic MegaRAID controllers. What I don't know is which exact one yours is, nor do I know if it (or any of the MegaRAID controllers) are high powered enough. Talk to your HW supplier to make sure you have controllers adequate to your HD's. ...and yes, your average access time will be in the 5.5ms - 6ms range when doing a physical seek. Even with RAID, you want to minimize seeks and maximize sequential IO when accessing them. Best to not go to HD at all ;-) Well, certainly, if you can get more into RAM, you're always better off. For writing, a battery-backed write cache, and for reading lots of system RAM. Hope this helps, Ron Peacetree John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: I'm just watching gnome-system-monoitor. Which after careful consideration.and looking at dstat means I'm on CRACKGSM isn't showing cached memory usageI asume that the cache memory usage is where data off of the disks would be cached...? Well a simple "free" also tells you how much has been cached. I believe by reading the _cach line, it looks like you have 4.6G cached. So you are indeed using memory. I'm still concerned why it seems to be taking 3-4ms per index lookup, when things should already be cached in RAM. Now, I may be wrong about whether the indexes are cached, but I sure would expect them to be. What is the time for a cached query on your system (with normal nested loops)? (give the EXPLAIN ANALYZE for the *second* run, or maybe the fourth). I'm glad that we aren't seeing something weird with your kernel, at least. John =:-> memory output from dstat is this for a few seconds: ---procs--- --memory-usage- ---paging-- --disk/sdadisk/sdb- swap--- total-cpu-usage run blk new|_used _buff _cach _free|__in_ _out_|_read write:_read write|_used _free|usr sys idl wai hiq siq 0 0 0|1336M 10M 4603M 17M| 490B 833B|3823B 3503k:1607k 4285k| 160k 2048M| 4 1 89 7 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 464k| 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 048k: 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 132k: 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 036k: 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 012k: 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 0 | 160k 2048M| 25 0 75 0 0 0 2 0 0|1353M 10M 4585M 18M| 0 0 | 0 0 : 0 0 | 160k 2048M| 25 1 75 0 0 0 1 0 0|1321M 10M 4616M 19M| 0 0 | 0 0 : 0 0 | 160k 2048M| 18 8 74 0 0 0 1 0 0|1326M 10M 4614M 17M| 0 0 | 0 0 :4096B0 | 160k 2048M| 16 10 74 1 0 0 1 0 0|1330M 10M 4609M 17M| 0 0 | 012k:4096B0 | 160k 2048M| 17 9 74 0 0 0 0 1 0|1343M 10M 4596M 17M| 0 0 | 0 0 : 0 316M| 160k 2048M| 5 10 74 11 0 1 0 1 0|1339M 10M 4596M 21M| 0 0 | 0 0 : 0 0 | 160k 2048M| 0 0 74 25 0 1 0 2 0|1334M 10M 4596M 25M| 0 0 | 0 4096B: 0 0 | 160k 2048M| 0 0 54 44 0 1 1 0 0|1326M 10M 4596M 34M| 0 0 | 0 0 : 0 364k| 160k 2048M| 4 1 60 34 0 1 1 0 0|1290M 10M 4596M 70M| 0 0 | 012k: 0 0 | 160k 2048M| 24 1 75 0 0 0 1 0 0|1301M 10M 4596M 59M| 0 0 | 020k: 0 0 | 160k 2048M| 21 4 75 0 0 0 1 0 0|1312M 10M 4596M 48M| 0 0 | 0 0 : 0 0 | 160k 2048M| 22 4 75 0 0 0 1 0 0|1323M 10M 4596M 37M| 0 0 | 0 0 : 024k| 160k 2048M| 21 4 75 0 0 0 1 0 0|1334M 10M 4596M 25M| 0 0 | 0 0 : 056k| 160k 2048M| 21 4 75 0 0 0 On Fri, 2005-08-19 at 16:07 -0500, John A Meinel wrote: Jeremiah Jahn wrote: Rebuild in progress with just ext3 on the raid array...will see if this helps the access times. If it doesn't I'll mess with the stripe size. I have REINDEXED, CLUSTERED, tablespaced and cached with 'cat table/index /dev/null' none of this seems to have helped, or even increased my memory usage. argh! The only thing about this new system that I'm unfamiliar with is the array setup and LVM, which is why I think that's where the issue is. clustering and indexing as well as vacuum etc are things that I do and have been aware of for sometime. Perhaps slony is a factor, but I really don't see it causing problems on index read speed esp. when it's not running. thanx for your help, I really appreciate it. -jj- By the way, how are you measuring memory usage? Can you give the output of that command, just to make sure you are reading it correctly. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] extremly low memory usage
At 02:16 PM 8/20/2005, Jeremiah Jahn wrote: I'm just watching gnome-system-monoitor. Which after careful consideration.and looking at dstat means I'm on CRACKGSM isn't showing cached memory usageI asume that the cache memory usage is where data off of the disks would be cached...? memory output from dstat is this for a few seconds: ---procs--- --memory-usage- ---paging-- --disk/sdadisk/sdb- swap--- total-cpu-usage run blk new|_used _buff _cach _free|__in_ _out_|_read write:_read write|_used _free|usr sys idl wai hiq siq 0 0 0|1336M 10M 4603M 17M| 490B 833B|3823B 3503k:1607k 4285k| 160k 2048M| 4 1 89 7 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 464k| 160k 2048M| 25 0 75 0 0 0 1 0 0|1334M 10M 4596M 25M| 0 0 | 0 0 : 056k| 160k 2048M| 21 4 75 0 0 0 Then the "low memory usage" was a chimera. Excellent! Given the evidence in this thread, IMO you should upgrade your box to 16GB of RAM ASAP. That should be enough to cache most, if not all, of the 10GB of the "hot" part of your DB; thereby dedicating your HD subsystem as much as possible to writes (which is unavoidable HD IO). As I've posted before, at $75-$150/GB, it's well worth the investment whenever you can prove it will help as we have here. Hope this helps, Ron Peacetree ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] extremly low memory usage
I'm reposting this because my mailer hiccuped when I sent it the first time. If this results in a double post, I apologize. At 02:53 PM 8/20/2005, Jeremiah Jahn wrote: On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote: > Jeremiah Jahn wrote: > > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote: > > > > > it's cached alright. I'm getting a read rate of about 150MB/sec. I would > > have thought is would be faster with my raid setup. I think I'm going to > > scrap the whole thing and get rid of LVM. I'll just do a straight ext3 > > system. Maybe that will help. Still trying to get suggestions for a > > stripe size. > > > > I don't think 150MB/s is out of the realm for a 14 drive array. > How fast is time dd if=/dev/zero of=testfile bs=8192 count=100 > time dd if=/dev/zero of=testfile bs=8192 count=100 100+0 records in 100+0 records out real1m24.248s user0m0.381s sys 0m33.028s > (That should create a 8GB file, which is too big to cache everything) > And then how fast is: > time dd if=testfile of=/dev/null bs=8192 count=100 time dd if=testfile of=/dev/null bs=8192 count=100 100+0 records in 100+0 records out real0m54.139s user0m0.326s sys 0m8.916s and on a second run: real0m55.667s user0m0.341s sys 0m9.013s > > That should give you a semi-decent way of measuring how fast the RAID > system is, since it should be too big to cache in ram. about 150MB/Sec. Is there no better way to make this go faster...? Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them doing raw sequential IO like this should be capable of at ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's, ~7*79MB/s= 553MB/s if using Fujitsu MAU's, and ~7*86MB/s= 602MB/s if using Maxtor Atlas 15K II's to devices external to the RAID array. _IF_ the controller setup is high powered enough to keep that kind of IO rate up. This will require a controller or controllers providing dual channel U320 bandwidth externally and quad channel U320 bandwidth internally. IOW, it needs a controller or controllers talking 64b 133MHz PCI-X, reasonably fast DSP/CPU units, and probably a decent sized IO buffer as well. AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic MegaRAID controllers. What I don't know is which exact one yours is, nor do I know if it (or any of the MegaRAID controllers) are high powered enough. Talk to your HW supplier to make sure you have controllers adequate to your HD's. ...and yes, your average access time will be in the 5.5ms - 6ms range when doing a physical seek. Even with RAID, you want to minimize seeks and maximize sequential IO when accessing them. Best to not go to HD at all ;-) Hope this helps, Ron Peacetree ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] extremly low memory usage
At 02:53 PM 8/20/2005, Jeremiah Jahn wrote: On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote: > Jeremiah Jahn wrote: > > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote: > > > > > it's cached alright. I'm getting a read rate of about 150MB/sec. I would > > have thought is would be faster with my raid setup. I think I'm going to > > scrap the whole thing and get rid of LVM. I'll just do a straight ext3 > > system. Maybe that will help. Still trying to get suggestions for a > > stripe size. > > > > I don't think 150MB/s is out of the realm for a 14 drive array. > How fast is time dd if=/dev/zero of=testfile bs=8192 count=100 > time dd if=/dev/zero of=testfile bs=8192 count=100 100+0 records in 100+0 records out real1m24.248s user0m0.381s sys 0m33.028s > (That should create a 8GB file, which is too big to cache everything) > And then how fast is: > time dd if=testfile of=/dev/null bs=8192 count=100 time dd if=testfile of=/dev/null bs=8192 count=100 100+0 records in 100+0 records out real0m54.139s user0m0.326s sys 0m8.916s and on a second run: real0m55.667s user0m0.341s sys 0m9.013s > > That should give you a semi-decent way of measuring how fast the RAID > system is, since it should be too big to cache in ram. about 150MB/Sec. Is there no better way to make this go faster...? Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them doing raw sequential IO like this should be capable of at ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's, ~7*79MB/s= 553MB/s if using Fujitsu MAU's, and ~7*86MB/s= 602MB/s if using Maxtor Atlas 15K II's to devices external to the RAID array. _IF_ the controller setup is high powered enough to keep that kind of IO rate up. This will require a controller or controllers providing dual channel U320 bandwidth externally and quad channel U320 bandwidth internally. IOW, it needs a controller or controllers talking 64b 133MHz PCI-X, reasonably fast DSP/CPU units, and probably a decent sized IO buffer as well. AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic MegaRAID controllers. What I don't know is which exact one yours is, nor do I know if it (or any of the MegaRAID controllers) are high powered enough. Talk to your HW supplier to make sure you have controllers adequate to your HD's. ...and yes, your average access time will be in the 5.5ms - 6ms range when doing a physical seek. Even with RAID, you want to minimize seeks and maximize sequential IO when accessing them. Best to not go to HD at all ;-) Hope this helps, Ron Peacetree ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] extremly low memory usage
On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote: > Jeremiah Jahn wrote: > > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote: > > > >>Jeremiah Jahn wrote: > >> > > > ... > > >> > >>Well, in general, 3ms for a single lookup seems really long. Maybe your > >>index is bloated by not vacuuming often enough. Do you tend to get a lot > >>of updates to litigant_details? > > > > I have vacuumed this already. I get lots of updates, but this data is > > mostly unchanging. > > > > > >>There are a couple possibilities at this point. First, you can REINDEX > >>the appropriate index, and see if that helps. However, if this is a test > >>box, it sounds like you just did a dump and reload, which wouldn't have > >>bloat in an index. > > > > > > I loaded it using slony > > I don't know that slony versus pg_dump/pg_restore really matters. The > big thing is that Updates wouldn't be trashing your index. > But if you are saying that you cluster once/wk your index can't be that > messed up anyway. (Unless CLUSTER messes up the non-clustered indexes, > but that would make cluster much less useful, so I would have guessed > this was not the case) > > > > > > >>Another possibility. Is this the column that you usually use when > >>pulling information out of litigant_details? If so, you can CLUSTER > >>litigant_details on the appropriate index. This will help things be > >>close together that should be, which decreases the index lookup costs. > > > > clustering on this right now. Most of the other things are already > > clustered. name and case_data > > Just as a reality check, they are clustered on the columns in question, > right? (I don't know if this column is a primary key or not, but any > index can be used for clustering). > > > > > > >>However, if this is not the common column, then you probably will slow > >>down whatever other accesses you may have on this table. > >> > >>After CLUSTER, the current data will stay clustered, but new data will > >>not, so you have to continually CLUSTER, the same way that you might > >>VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as > >>expensive as a VACUUM FULL. Be aware of this, but it might vastly > >>improve your performance, so it would be worth it. > > > > I generally re-cluster once a week. > > > >>> > So there is no need for preloading your indexes on the identity table. > It is definitely not the bottleneck. > > So a few design bits, which may help your database. > Why is "actor_id" a text field instead of a number? > >>> > >>>This is simply due to the nature of the data. > >>> > >> > >>I'm just wondering if changing into a number, and using a number->name > >>lookup would be faster for you. It may not be. In general, I prefer to > >>use numbers for references. I may be over paranoid, but I know that some > >>locales are bad with string -> string comparisons. And since the data in > >>your database is stored as UNICODE, I'm not sure if it has to do any > >>translating or not. Again, something to consider, it may not make any > >>difference. > > > > I don't believe so. I initialze the DB as 'lang=C'. I used to have the > > problem where things were being inited as en_US. this would prevent any > > text based index from working. This doesn't seem to be the case here, so > > I'm not worried about it. > > > > Sorry, I think I was confusing you with someone else who posted SHOW ALL. > > > > > > >> > > ... > > > it's cached alright. I'm getting a read rate of about 150MB/sec. I would > > have thought is would be faster with my raid setup. I think I'm going to > > scrap the whole thing and get rid of LVM. I'll just do a straight ext3 > > system. Maybe that will help. Still trying to get suggestions for a > > stripe size. > > > > I don't think 150MB/s is out of the realm for a 14 drive array. > How fast is > time dd if=/dev/zero of=testfile bs=8192 count=100 time dd if=/dev/zero of=testfile bs=8192 count=100 100+0 records in 100+0 records out real1m24.248s user0m0.381s sys 0m33.028s > (That should create a 8GB file, which is too big to cache everything) > And then how fast is: > time dd if=testfile of=/dev/null bs=8192 count=100 time dd if=testfile of=/dev/null bs=8192 count=100 100+0 records in 100+0 records out real0m54.139s user0m0.326s sys 0m8.916s and on a second run: real0m55.667s user0m0.341s sys 0m9.013s > > That should give you a semi-decent way of measuring how fast the RAID > system is, since it should be too big to cache in ram. about 150MB/Sec. Is there no better way to make this go faster...? > > > > >>I can point you to REINDEX and CLUSTER, but if it is caching in ram, I > >>honestly can't say why the per loop would be that much slower. > >>Are both systems running the same postgres version? It sounds like it is > >>different (since you say something about switching to 8.0). > > > > These had little or no effect. > > The produc
Re: [PERFORM] extremly low memory usage
On Sat, 2005-08-20 at 11:59 -0400, Ron wrote: > At 04:11 PM 8/19/2005, Jeremiah Jahn wrote: > >On Fri, 2005-08-19 at 14:23 -0500, John A Meinel wrote: > > > Ron wrote: > > > > At 01:18 PM 8/19/2005, John A Meinel wrote: > > > > > > > >> Jeremiah Jahn wrote: > > > >> > Sorry about the formatting. > > > >> > > > > >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote: > > > >> > > > > >> >>Jeremiah Jahn wrote: > > > >> >> > > > >> >> > > > >> > > > >> ... > > > >> > > > >> >>The expensive parts are the 4915 lookups into the litigant_details > > > >> (each > > > >> >>one takes approx 4ms for a total of ~20s). > > > >> >>And then you do it again on case_data (average 3ms each * 4906 loops > > > >> >>= > > > >> >>~15s). > > > >> > > > > >> > Is there some way to avoid this? > > > >> > > > > >> > > > >> Well, in general, 3ms for a single lookup seems really long. Maybe your > > > >> index is bloated by not vacuuming often enough. Do you tend to get a > > > >> lot > > > >> of updates to litigant_details? > > > > > > > > > > > > Given that the average access time for a 15Krpm HD is in the 5.5-6ms > > > > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single > > > > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case) > > > > table accesses is requiring a seek. > > > > > >I think LVM may be a problem, since it also seems to break things up on > >the file system. My access time on the seek should be around 1/7th the > >15Krpm I believe since it's a 14 disk raid 10 array. And no other > >traffic at the moment. > > Oops. There's a misconception here. RAID arrays increase > _throughput_ AKA _bandwidth_ through parallel access to HDs. OTOH, > access time is _latency_, and that is not changed. Access time for a > RAID set is equal to that of the slowest access time, AKA highest > latency, HD in the RAID set. so I will max out at the 5.5-6ms rang for access time? > > > > Well, from what he has said, the total indexes are < 1GB and he has 6GB > > > of ram. So everything should fit. Not to mention he is only accessing > > > 5000/several million rows. > >I table spaced some of the indexes and they are around 211066880 bytes > >for the name_speed index and 149825330 for the lit_actor_speed index > >tables seem to be about a gig. > > Hmm. And you think you are only using 250MB out of your 6GB of > RAM? Something doesn't seem to add up here. From what's been > posted, I'd expect much more RAM to be in use. the cached memory usage is complete using up the rest of the memory. > > > > > > This implies a poor match between physical layout and access pattern. > > > > > > This seems to be the case. But since this is not the only query, it may > > > be that other access patterns are more important to optimize for. > > > > > > > > > > > If I understand correctly, the table should not be very fragmented given > > > > that this is a reasonably freshly loaded DB? That implies that the > > > > fields being looked up are not well sorted in the table compared to the > > > > query pattern. > > > > > > > > If the entire table could fit in RAM, this would be far less of a > > > > consideration. Failing that, the physical HD layout has to be improved > > > > or the query pattern has to be changed to reduce seeks. > > > > > > > > > > > > > > ... > > > > > > >> After CLUSTER, the current data will stay clustered, but new data will > > > >> not, so you have to continually CLUSTER, the same way that you might > > > >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as > > > >> expensive as a VACUUM FULL. Be aware of this, but it might vastly > > > >> improve your performance, so it would be worth it. > > > > > > > > > > > > CLUSTER can be a very large maintenance overhead/problem if the table(s) > > > > in question actually need to be "continually" re CLUSTER ed. > > > > > > > > If there is no better solution available, then you do what you have to, > > > > but it feels like there should be a better answer here. > > > > > > > > Perhaps the DB schema needs examining to see if it matches up well with > > > > its real usage? > > > > > > > > Ron Peacetree > > > > > > > > > > I certainly agree that CLUSTER is expensive, and is an on-going > > > maintenance issue. If it is the normal access pattern, though, it may be > > > worth it. > > > >The query I've sent you is one of the most common I get just change the > >name. I handle about 180K of them a day mostly between 8 and 5. The > >clustering has never really been a problem. Like I said before I do it > >about once a week. I handle about 3000 update an hour consisting of > >about 1000-3000 statement per update. ie about 2.5 million updates per > >hour. In the last few months or so I've filtered these down to about > >400K update/delete/insert statements per hour. > > 2.5M updates per hour = ~695 updates per second. 400K per hour = > ~112 updates per sec. These should be well within the capabilities > of a RAID
Re: [PERFORM] extremly low memory usage
I'm just watching gnome-system-monoitor. Which after careful consideration.and looking at dstat means I'm on CRACKGSM isn't showing cached memory usageI asume that the cache memory usage is where data off of the disks would be cached...? memory output from dstat is this for a few seconds: ---procs--- --memory-usage- ---paging-- --disk/sdadisk/sdb- swap--- total-cpu-usage run blk new|_used _buff _cach _free|__in_ _out_|_read write:_read write|_used _free|usr sys idl wai hiq siq 0 0 0|1336M 10M 4603M 17M| 490B 833B|3823B 3503k:1607k 4285k| 160k 2048M| 4 1 89 7 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 464k| 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 048k: 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 132k: 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 036k: 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 012k: 0 0 | 160k 2048M| 25 0 75 0 0 0 1 0 0|1337M 10M 4600M 18M| 0 0 | 0 0 : 0 0 | 160k 2048M| 25 0 75 0 0 0 2 0 0|1353M 10M 4585M 18M| 0 0 | 0 0 : 0 0 | 160k 2048M| 25 1 75 0 0 0 1 0 0|1321M 10M 4616M 19M| 0 0 | 0 0 : 0 0 | 160k 2048M| 18 8 74 0 0 0 1 0 0|1326M 10M 4614M 17M| 0 0 | 0 0 :4096B0 | 160k 2048M| 16 10 74 1 0 0 1 0 0|1330M 10M 4609M 17M| 0 0 | 012k:4096B0 | 160k 2048M| 17 9 74 0 0 0 0 1 0|1343M 10M 4596M 17M| 0 0 | 0 0 : 0 316M| 160k 2048M| 5 10 74 11 0 1 0 1 0|1339M 10M 4596M 21M| 0 0 | 0 0 : 0 0 | 160k 2048M| 0 0 74 25 0 1 0 2 0|1334M 10M 4596M 25M| 0 0 | 0 4096B: 0 0 | 160k 2048M| 0 0 54 44 0 1 1 0 0|1326M 10M 4596M 34M| 0 0 | 0 0 : 0 364k| 160k 2048M| 4 1 60 34 0 1 1 0 0|1290M 10M 4596M 70M| 0 0 | 012k: 0 0 | 160k 2048M| 24 1 75 0 0 0 1 0 0|1301M 10M 4596M 59M| 0 0 | 020k: 0 0 | 160k 2048M| 21 4 75 0 0 0 1 0 0|1312M 10M 4596M 48M| 0 0 | 0 0 : 0 0 | 160k 2048M| 22 4 75 0 0 0 1 0 0|1323M 10M 4596M 37M| 0 0 | 0 0 : 024k| 160k 2048M| 21 4 75 0 0 0 1 0 0|1334M 10M 4596M 25M| 0 0 | 0 0 : 056k| 160k 2048M| 21 4 75 0 0 0 On Fri, 2005-08-19 at 16:07 -0500, John A Meinel wrote: > Jeremiah Jahn wrote: > > Rebuild in progress with just ext3 on the raid array...will see if this > > helps the access times. If it doesn't I'll mess with the stripe size. I > > have REINDEXED, CLUSTERED, tablespaced and cached with 'cat table/index > > > >>/dev/null' none of this seems to have helped, or even increased my > > > > memory usage. argh! The only thing about this new system that I'm > > unfamiliar with is the array setup and LVM, which is why I think that's > > where the issue is. clustering and indexing as well as vacuum etc are > > things that I do and have been aware of for sometime. Perhaps slony is a > > factor, but I really don't see it causing problems on index read speed > > esp. when it's not running. > > > > thanx for your help, I really appreciate it. > > -jj- > > > > By the way, how are you measuring memory usage? Can you give the output > of that command, just to make sure you are reading it correctly. > > John > =:-> > -- Speak softly and carry a +6 two-handed sword. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] extremly low memory usage
At 04:11 PM 8/19/2005, Jeremiah Jahn wrote: On Fri, 2005-08-19 at 14:23 -0500, John A Meinel wrote: > Ron wrote: > > At 01:18 PM 8/19/2005, John A Meinel wrote: > > > >> Jeremiah Jahn wrote: > >> > Sorry about the formatting. > >> > > >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote: > >> > > >> >>Jeremiah Jahn wrote: > >> >> > >> >> > >> > >> ... > >> > >> >>The expensive parts are the 4915 lookups into the litigant_details > >> (each > >> >>one takes approx 4ms for a total of ~20s). > >> >>And then you do it again on case_data (average 3ms each * 4906 loops = > >> >>~15s). > >> > > >> > Is there some way to avoid this? > >> > > >> > >> Well, in general, 3ms for a single lookup seems really long. Maybe your > >> index is bloated by not vacuuming often enough. Do you tend to get a lot > >> of updates to litigant_details? > > > > > > Given that the average access time for a 15Krpm HD is in the 5.5-6ms > > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single > > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case) > > table accesses is requiring a seek. > > I think LVM may be a problem, since it also seems to break things up on the file system. My access time on the seek should be around 1/7th the 15Krpm I believe since it's a 14 disk raid 10 array. And no other traffic at the moment. Oops. There's a misconception here. RAID arrays increase _throughput_ AKA _bandwidth_ through parallel access to HDs. OTOH, access time is _latency_, and that is not changed. Access time for a RAID set is equal to that of the slowest access time, AKA highest latency, HD in the RAID set. > Well, from what he has said, the total indexes are < 1GB and he has 6GB > of ram. So everything should fit. Not to mention he is only accessing > 5000/several million rows. I table spaced some of the indexes and they are around 211066880 bytes for the name_speed index and 149825330 for the lit_actor_speed index tables seem to be about a gig. Hmm. And you think you are only using 250MB out of your 6GB of RAM? Something doesn't seem to add up here. From what's been posted, I'd expect much more RAM to be in use. > > This implies a poor match between physical layout and access pattern. > > This seems to be the case. But since this is not the only query, it may > be that other access patterns are more important to optimize for. > > > > > If I understand correctly, the table should not be very fragmented given > > that this is a reasonably freshly loaded DB? That implies that the > > fields being looked up are not well sorted in the table compared to the > > query pattern. > > > > If the entire table could fit in RAM, this would be far less of a > > consideration. Failing that, the physical HD layout has to be improved > > or the query pattern has to be changed to reduce seeks. > > > > > > ... > > >> After CLUSTER, the current data will stay clustered, but new data will > >> not, so you have to continually CLUSTER, the same way that you might > >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as > >> expensive as a VACUUM FULL. Be aware of this, but it might vastly > >> improve your performance, so it would be worth it. > > > > > > CLUSTER can be a very large maintenance overhead/problem if the table(s) > > in question actually need to be "continually" re CLUSTER ed. > > > > If there is no better solution available, then you do what you have to, > > but it feels like there should be a better answer here. > > > > Perhaps the DB schema needs examining to see if it matches up well with > > its real usage? > > > > Ron Peacetree > > > > I certainly agree that CLUSTER is expensive, and is an on-going > maintenance issue. If it is the normal access pattern, though, it may be > worth it. The query I've sent you is one of the most common I get just change the name. I handle about 180K of them a day mostly between 8 and 5. The clustering has never really been a problem. Like I said before I do it about once a week. I handle about 3000 update an hour consisting of about 1000-3000 statement per update. ie about 2.5 million updates per hour. In the last few months or so I've filtered these down to about 400K update/delete/insert statements per hour. 2.5M updates per hour = ~695 updates per second. 400K per hour = ~112 updates per sec. These should be well within the capabilities of a RAID 10 subsystem based on 14 15Krpm HDs assuming a decent RAID card. What is the exact HW of the RAID subsystem involved and how is it configured? You shouldn't be having a performance problem AFAICT... > I also wonder, though, if his table is properly normalized. Which, as you mentioned, might lead to improved access patterns. The system is about as normalized as I can get it. In general the layout is the following: courts have cases, cases have litigant_details. Actors have identities and litigant_details. Hmmm. Can you tell us more about the actual schema, I may h
Re: [PERFORM] extremly low memory usage
Michael Stone <[EMAIL PROTECTED]> writes: > On Sat, Aug 20, 2005 at 02:17:54PM +0300, Marko Ristola wrote: >> Based on my knoledge, Ext3 is good with keeping filesystem integrity >> AND data integrity while pressing the reset button. However, by >> selecting data=writeback, you gain more speed, but you risk the data >> integrity during a crash: Ext3 garantees only filesystem integrity. > That's why postgres keeps its own transaction log. Any of these > filesystems guarantee data integrity for data that's been synced to > disk, and postgres keeps track of what data has and has not been > committed so it can recover gracefully from a failure. Right. I think the optimal setting for a Postgres data directory is journaled metadata, non-journaled file content. Postgres can take care of the data integrity for itself, but it does assume that the filesystem stays structurally sane (eg, data blocks don't get reassigned to the wrong file), so you need a filesystem guarantee about the metadata. WAL files are handled in a much more conservative way (created, filled with zeroes, and fsync'd before we ever put any valuable data in 'em). If you have WAL on its own drive then I think Mike's recommendation of no filesystem journalling at all for that drive is probably OK. Or you can do same as above (journal metadata only) if you want a little extra protection. And of course all this reasoning depends on the assumption that the drive tells the truth about write-completion. If the drive does write caching it had better be able to complete all its accepted writes before dying in a power failure. (Hence, battery-backed write cache is OK, any other kind is evil.) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] extremly low memory usage
On Sat, Aug 20, 2005 at 02:17:54PM +0300, Marko Ristola wrote: Based on my knoledge, Ext3 is good with keeping filesystem integrity AND data integrity while pressing the reset button. However, by selecting data=writeback, you gain more speed, but you risk the data integrity during a crash: Ext3 garantees only filesystem integrity. That's why postgres keeps its own transaction log. Any of these filesystems guarantee data integrity for data that's been synced to disk, and postgres keeps track of what data has and has not been committed so it can recover gracefully from a failure. That's why most filesystems are designed the way they are; the application can determine what things need better data integrity and which need better performance on a case-by-case basis. Mike Stone ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] extremly low memory usage
On Sat, Aug 20, 2005 at 01:12:15AM -0600, Dan Harris wrote: XFS seems to be a trusted choice, followed by Reiser and JFS both with the occasional controversy when the comparisons pop up. And don't put the xlog on a journaled filesystem. There is no advantage to doing so, and it will slow things down. (Assuming a sane seperate xlog partition configuration, sized reasonably.) Mike Stone ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] extremly low memory usage
Dan Harris wrote: > From my recent experiences, I can say ext3 is probably not a great > choice for Pg databases. If you check the archives you'll see > there's a lot of discussion about various journalling filesystems and > ext3 usually(always?) comes up on the bottom as far as performance > goes. If you insist on using it, I would at least recommend the > noatime option in fstab and using data=writeback to get the faster Based on my knoledge, Ext3 is good with keeping filesystem integrity AND data integrity while pressing the reset button. However, by selecting data=writeback, you gain more speed, but you risk the data integrity during a crash: Ext3 garantees only filesystem integrity. This means with database transaction logs: The last transactions are not guaranteed to be written into the hard drives during a hardware reset, meaning of a loss of some committed transactions. Reiserfs is known to do things this false way also. Is there a way with a Reiserfs filesystem to fulfill both filesystem AND data integrity requirements nowadays? See for example "man mount" to see the effects of data=journal, data=ordered(default) and data=writeback for Ext3. Only the writeback risks data integrity. Ext3 is the only journaled filesystem, that I know that fulfills these fundamental data integrity guarantees. Personally I like about such filesystems, even though it means less speed. Marko Ristola ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] extremly low memory usage
On Aug 19, 2005, at 3:01 PM, Jeremiah Jahn wrote: Rebuild in progress with just ext3 on the raid array...will see if this helps the access times. From my recent experiences, I can say ext3 is probably not a great choice for Pg databases. If you check the archives you'll see there's a lot of discussion about various journalling filesystems and ext3 usually(always?) comes up on the bottom as far as performance goes. If you insist on using it, I would at least recommend the noatime option in fstab and using data=writeback to get the faster of the journal modes. XFS seems to be a trusted choice, followed by Reiser and JFS both with the occasional controversy when the comparisons pop up. -Dan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: > Rebuild in progress with just ext3 on the raid array...will see if this > helps the access times. If it doesn't I'll mess with the stripe size. I > have REINDEXED, CLUSTERED, tablespaced and cached with 'cat table/index > >>/dev/null' none of this seems to have helped, or even increased my > > memory usage. argh! The only thing about this new system that I'm > unfamiliar with is the array setup and LVM, which is why I think that's > where the issue is. clustering and indexing as well as vacuum etc are > things that I do and have been aware of for sometime. Perhaps slony is a > factor, but I really don't see it causing problems on index read speed > esp. when it's not running. > > thanx for your help, I really appreciate it. > -jj- > By the way, how are you measuring memory usage? Can you give the output of that command, just to make sure you are reading it correctly. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote: > >>Jeremiah Jahn wrote: >> ... >> >>Well, in general, 3ms for a single lookup seems really long. Maybe your >>index is bloated by not vacuuming often enough. Do you tend to get a lot >>of updates to litigant_details? > > I have vacuumed this already. I get lots of updates, but this data is > mostly unchanging. > > >>There are a couple possibilities at this point. First, you can REINDEX >>the appropriate index, and see if that helps. However, if this is a test >>box, it sounds like you just did a dump and reload, which wouldn't have >>bloat in an index. > > > I loaded it using slony I don't know that slony versus pg_dump/pg_restore really matters. The big thing is that Updates wouldn't be trashing your index. But if you are saying that you cluster once/wk your index can't be that messed up anyway. (Unless CLUSTER messes up the non-clustered indexes, but that would make cluster much less useful, so I would have guessed this was not the case) > > >>Another possibility. Is this the column that you usually use when >>pulling information out of litigant_details? If so, you can CLUSTER >>litigant_details on the appropriate index. This will help things be >>close together that should be, which decreases the index lookup costs. > > clustering on this right now. Most of the other things are already > clustered. name and case_data Just as a reality check, they are clustered on the columns in question, right? (I don't know if this column is a primary key or not, but any index can be used for clustering). > > >>However, if this is not the common column, then you probably will slow >>down whatever other accesses you may have on this table. >> >>After CLUSTER, the current data will stay clustered, but new data will >>not, so you have to continually CLUSTER, the same way that you might >>VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as >>expensive as a VACUUM FULL. Be aware of this, but it might vastly >>improve your performance, so it would be worth it. > > I generally re-cluster once a week. > >>> So there is no need for preloading your indexes on the identity table. It is definitely not the bottleneck. So a few design bits, which may help your database. Why is "actor_id" a text field instead of a number? >>> >>>This is simply due to the nature of the data. >>> >> >>I'm just wondering if changing into a number, and using a number->name >>lookup would be faster for you. It may not be. In general, I prefer to >>use numbers for references. I may be over paranoid, but I know that some >>locales are bad with string -> string comparisons. And since the data in >>your database is stored as UNICODE, I'm not sure if it has to do any >>translating or not. Again, something to consider, it may not make any >>difference. > > I don't believe so. I initialze the DB as 'lang=C'. I used to have the > problem where things were being inited as en_US. this would prevent any > text based index from working. This doesn't seem to be the case here, so > I'm not worried about it. > Sorry, I think I was confusing you with someone else who posted SHOW ALL. > > >> ... > it's cached alright. I'm getting a read rate of about 150MB/sec. I would > have thought is would be faster with my raid setup. I think I'm going to > scrap the whole thing and get rid of LVM. I'll just do a straight ext3 > system. Maybe that will help. Still trying to get suggestions for a > stripe size. > I don't think 150MB/s is out of the realm for a 14 drive array. How fast is time dd if=/dev/zero of=testfile bs=8192 count=100 (That should create a 8GB file, which is too big to cache everything) And then how fast is: time dd if=testfile of=/dev/null bs=8192 count=100 That should give you a semi-decent way of measuring how fast the RAID system is, since it should be too big to cache in ram. > >>I can point you to REINDEX and CLUSTER, but if it is caching in ram, I >>honestly can't say why the per loop would be that much slower. >>Are both systems running the same postgres version? It sounds like it is >>different (since you say something about switching to 8.0). > > These had little or no effect. > The production machine is running 7.4 while the devel machine is running > 8.0 > Well, my concern is that maybe some portion of the 8.0 code actually slowed things down for you. You could try reverting to 7.4 on the devel box, though I think playing with upgrading to 8.1 might be more worthwhile. ... > > this is a cached version. > I assume that you mean this is the second run of the query. I can't compare it too much, since this is "smith" rather than "jones". But this one is 17s rather than the other one being 46s. And that includes having 8k rows instead of having 5k rows. Have you tried other values with disabled nested loops? Because this query (at least in cached form) seems to be *way* faster than with nested loops. I know that you
Re: [PERFORM] extremly low memory usage
Rebuild in progress with just ext3 on the raid array...will see if this helps the access times. If it doesn't I'll mess with the stripe size. I have REINDEXED, CLUSTERED, tablespaced and cached with 'cat table/index > /dev/null' none of this seems to have helped, or even increased my memory usage. argh! The only thing about this new system that I'm unfamiliar with is the array setup and LVM, which is why I think that's where the issue is. clustering and indexing as well as vacuum etc are things that I do and have been aware of for sometime. Perhaps slony is a factor, but I really don't see it causing problems on index read speed esp. when it's not running. thanx for your help, I really appreciate it. -jj- On Fri, 2005-08-19 at 14:23 -0500, John A Meinel wrote: > Ron wrote: > > At 01:18 PM 8/19/2005, John A Meinel wrote: > > > >> Jeremiah Jahn wrote: > >> > Sorry about the formatting. > >> > > >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote: > >> > > >> >>Jeremiah Jahn wrote: > >> >> > >> >> > >> > >> ... > >> > >> >>The expensive parts are the 4915 lookups into the litigant_details > >> (each > >> >>one takes approx 4ms for a total of ~20s). > >> >>And then you do it again on case_data (average 3ms each * 4906 loops = > >> >>~15s). > >> > > >> > Is there some way to avoid this? > >> > > >> > >> Well, in general, 3ms for a single lookup seems really long. Maybe your > >> index is bloated by not vacuuming often enough. Do you tend to get a lot > >> of updates to litigant_details? > > > > > > Given that the average access time for a 15Krpm HD is in the 5.5-6ms > > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single > > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case) > > table accesses is requiring a seek. > > > > > Well, from what he has said, the total indexes are < 1GB and he has 6GB > of ram. So everything should fit. Not to mention he is only accessing > 5000/several million rows. > > > > This implies a poor match between physical layout and access pattern. > > This seems to be the case. But since this is not the only query, it may > be that other access patterns are more important to optimize for. > > > > > If I understand correctly, the table should not be very fragmented given > > that this is a reasonably freshly loaded DB? That implies that the > > fields being looked up are not well sorted in the table compared to the > > query pattern. > > > > If the entire table could fit in RAM, this would be far less of a > > consideration. Failing that, the physical HD layout has to be improved > > or the query pattern has to be changed to reduce seeks. > > > > > > ... > > >> After CLUSTER, the current data will stay clustered, but new data will > >> not, so you have to continually CLUSTER, the same way that you might > >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as > >> expensive as a VACUUM FULL. Be aware of this, but it might vastly > >> improve your performance, so it would be worth it. > > > > > > CLUSTER can be a very large maintenance overhead/problem if the table(s) > > in question actually need to be "continually" re CLUSTER ed. > > > > If there is no better solution available, then you do what you have to, > > but it feels like there should be a better answer here. > > > > Perhaps the DB schema needs examining to see if it matches up well with > > its real usage? > > > > Ron Peacetree > > > > I certainly agree that CLUSTER is expensive, and is an on-going > maintenance issue. If it is the normal access pattern, though, it may be > worth it. > > I also wonder, though, if his table is properly normalized. Which, as > you mentioned, might lead to improved access patterns. > > John > =:-> -- Speak softly and carry a +6 two-handed sword. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] extremly low memory usage
On Fri, 2005-08-19 at 14:23 -0500, John A Meinel wrote: > Ron wrote: > > At 01:18 PM 8/19/2005, John A Meinel wrote: > > > >> Jeremiah Jahn wrote: > >> > Sorry about the formatting. > >> > > >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote: > >> > > >> >>Jeremiah Jahn wrote: > >> >> > >> >> > >> > >> ... > >> > >> >>The expensive parts are the 4915 lookups into the litigant_details > >> (each > >> >>one takes approx 4ms for a total of ~20s). > >> >>And then you do it again on case_data (average 3ms each * 4906 loops = > >> >>~15s). > >> > > >> > Is there some way to avoid this? > >> > > >> > >> Well, in general, 3ms for a single lookup seems really long. Maybe your > >> index is bloated by not vacuuming often enough. Do you tend to get a lot > >> of updates to litigant_details? > > > > > > Given that the average access time for a 15Krpm HD is in the 5.5-6ms > > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single > > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case) > > table accesses is requiring a seek. > > I think LVM may be a problem, since it also seems to break things up on the file system. My access time on the seek should be around 1/7th the 15Krpm I believe since it's a 14 disk raid 10 array. And no other traffic at the moment. > > > Well, from what he has said, the total indexes are < 1GB and he has 6GB > of ram. So everything should fit. Not to mention he is only accessing > 5000/several million rows. I table spaced some of the indexes and they are around 211066880 bytes for the name_speed index and 149825330 for the lit_actor_speed index tables seem to be about a gig. > > > > This implies a poor match between physical layout and access pattern. > > This seems to be the case. But since this is not the only query, it may > be that other access patterns are more important to optimize for. > > > > > If I understand correctly, the table should not be very fragmented given > > that this is a reasonably freshly loaded DB? That implies that the > > fields being looked up are not well sorted in the table compared to the > > query pattern. > > > > If the entire table could fit in RAM, this would be far less of a > > consideration. Failing that, the physical HD layout has to be improved > > or the query pattern has to be changed to reduce seeks. > > > > > > ... > > >> After CLUSTER, the current data will stay clustered, but new data will > >> not, so you have to continually CLUSTER, the same way that you might > >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as > >> expensive as a VACUUM FULL. Be aware of this, but it might vastly > >> improve your performance, so it would be worth it. > > > > > > CLUSTER can be a very large maintenance overhead/problem if the table(s) > > in question actually need to be "continually" re CLUSTER ed. > > > > If there is no better solution available, then you do what you have to, > > but it feels like there should be a better answer here. > > > > Perhaps the DB schema needs examining to see if it matches up well with > > its real usage? > > > > Ron Peacetree > > > > I certainly agree that CLUSTER is expensive, and is an on-going > maintenance issue. If it is the normal access pattern, though, it may be > worth it. The query I've sent you is one of the most common I get just change the name. I handle about 180K of them a day mostly between 8 and 5. The clustering has never really been a problem. Like I said before I do it about once a week. I handle about 3000 update an hour consisting of about 1000-3000 statement per update. ie about 2.5 million updates per hour. In the last few months or so I've filtered these down to about 400K update/delete/insert statements per hour. > > I also wonder, though, if his table is properly normalized. Which, as > you mentioned, might lead to improved access patterns. The system is about as normalized as I can get it. In general the layout is the following: courts have cases, cases have litigant_details. Actors have identities and litigant_details. > > John > =:-> -- Speak softly and carry a +6 two-handed sword. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] extremly low memory usage
On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote: > Jeremiah Jahn wrote: > > Sorry about the formatting. > > > > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote: > > > >>Jeremiah Jahn wrote: > >> > >> > > ... > > >>The expensive parts are the 4915 lookups into the litigant_details (each > >>one takes approx 4ms for a total of ~20s). > >>And then you do it again on case_data (average 3ms each * 4906 loops = > >>~15s). > > > > Is there some way to avoid this? > > > > Well, in general, 3ms for a single lookup seems really long. Maybe your > index is bloated by not vacuuming often enough. Do you tend to get a lot > of updates to litigant_details? I have vacuumed this already. I get lots of updates, but this data is mostly unchanging. > > There are a couple possibilities at this point. First, you can REINDEX > the appropriate index, and see if that helps. However, if this is a test > box, it sounds like you just did a dump and reload, which wouldn't have > bloat in an index. I loaded it using slony > > Another possibility. Is this the column that you usually use when > pulling information out of litigant_details? If so, you can CLUSTER > litigant_details on the appropriate index. This will help things be > close together that should be, which decreases the index lookup costs. clustering on this right now. Most of the other things are already clustered. name and case_data > > However, if this is not the common column, then you probably will slow > down whatever other accesses you may have on this table. > > After CLUSTER, the current data will stay clustered, but new data will > not, so you have to continually CLUSTER, the same way that you might > VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as > expensive as a VACUUM FULL. Be aware of this, but it might vastly > improve your performance, so it would be worth it. I generally re-cluster once a week. > > > > > > >>So there is no need for preloading your indexes on the identity table. > >>It is definitely not the bottleneck. > >> > >>So a few design bits, which may help your database. > >>Why is "actor_id" a text field instead of a number? > > > > This is simply due to the nature of the data. > > > > I'm just wondering if changing into a number, and using a number->name > lookup would be faster for you. It may not be. In general, I prefer to > use numbers for references. I may be over paranoid, but I know that some > locales are bad with string -> string comparisons. And since the data in > your database is stored as UNICODE, I'm not sure if it has to do any > translating or not. Again, something to consider, it may not make any > difference. I don't believe so. I initialze the DB as 'lang=C'. I used to have the problem where things were being inited as en_US. this would prevent any text based index from working. This doesn't seem to be the case here, so I'm not worried about it. > > > > > >>You could try creating an index on "litigant_details (actor_id, > >>count_ori)" so that it can do just an index lookup, rather than an index > >>+ filter. > > > > I have one, but it doesn't seem to like to use it. Don't really need it > > though, I can just drop the court_id out of the query. It's redundant, > > since each actor_id is also unique in litigant details. I had run vac > > full and analyze but I ran them again anyway and the planning improved. > > However, my 14 disk raid 10 array is still slower than my 3 disk raid 5 > > on my production box. 46sec vs 30sec (with live traffic on the > > production) One of the strange things is that when I run the cat command > > on my index and tables that are "HOT" it has no effect on memory usage. > > Right now I'm running ext3 on LVM. I'm still in a position to redo the > > file system and everything. Is this a good way to do it or should I > > switch to something else? What about stripe and extent sizes...? kernel > > parameters to change? > > Well, the plans are virtually identical. There is one small difference > as to whether it joins against case_data or court first. But 'court' is > very tiny (small enough to use a seqscan instead of index scan) I'm a > little surprised with court being this small that it doesn't do > something like a hash aggregation, but court takes no time anyway. > > The real problem is that your nested loop index time is *much* slower. > > Devel: > -> Index Scan using lit_actor_speed on litigant_details > (cost=0.00..3.96 rows=1 width=81) > (actual time=4.788..4.812 rows=1 loops=5057) > > Production: > -> Index Scan using lit_actor_speed on litigant_details > (cost=0.00..5.63 rows=1 width=81) > (actual time=3.355..3.364 rows=1 loops=5057) > > Devel: > -> Index Scan using case_speed on case_data > (cost=0.00..3.46 rows=1 width=26) > (actual time=4.222..4.230 rows=1 loops=5052) > > Production: > -> Index Scan using case_data_pkey on case_data > (cost=0.00..5.31 rows=1 width=26) > (actual ti
Re: [PERFORM] extremly low memory usage
Ron wrote: > At 01:18 PM 8/19/2005, John A Meinel wrote: > >> Jeremiah Jahn wrote: >> > Sorry about the formatting. >> > >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote: >> > >> >>Jeremiah Jahn wrote: >> >> >> >> >> >> ... >> >> >>The expensive parts are the 4915 lookups into the litigant_details >> (each >> >>one takes approx 4ms for a total of ~20s). >> >>And then you do it again on case_data (average 3ms each * 4906 loops = >> >>~15s). >> > >> > Is there some way to avoid this? >> > >> >> Well, in general, 3ms for a single lookup seems really long. Maybe your >> index is bloated by not vacuuming often enough. Do you tend to get a lot >> of updates to litigant_details? > > > Given that the average access time for a 15Krpm HD is in the 5.5-6ms > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case) > table accesses is requiring a seek. > Well, from what he has said, the total indexes are < 1GB and he has 6GB of ram. So everything should fit. Not to mention he is only accessing 5000/several million rows. > This implies a poor match between physical layout and access pattern. This seems to be the case. But since this is not the only query, it may be that other access patterns are more important to optimize for. > > If I understand correctly, the table should not be very fragmented given > that this is a reasonably freshly loaded DB? That implies that the > fields being looked up are not well sorted in the table compared to the > query pattern. > > If the entire table could fit in RAM, this would be far less of a > consideration. Failing that, the physical HD layout has to be improved > or the query pattern has to be changed to reduce seeks. > > ... >> After CLUSTER, the current data will stay clustered, but new data will >> not, so you have to continually CLUSTER, the same way that you might >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as >> expensive as a VACUUM FULL. Be aware of this, but it might vastly >> improve your performance, so it would be worth it. > > > CLUSTER can be a very large maintenance overhead/problem if the table(s) > in question actually need to be "continually" re CLUSTER ed. > > If there is no better solution available, then you do what you have to, > but it feels like there should be a better answer here. > > Perhaps the DB schema needs examining to see if it matches up well with > its real usage? > > Ron Peacetree > I certainly agree that CLUSTER is expensive, and is an on-going maintenance issue. If it is the normal access pattern, though, it may be worth it. I also wonder, though, if his table is properly normalized. Which, as you mentioned, might lead to improved access patterns. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] extremly low memory usage
At 01:18 PM 8/19/2005, John A Meinel wrote: Jeremiah Jahn wrote: > Sorry about the formatting. > > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote: > >>Jeremiah Jahn wrote: >> >> ... >>The expensive parts are the 4915 lookups into the litigant_details (each >>one takes approx 4ms for a total of ~20s). >>And then you do it again on case_data (average 3ms each * 4906 loops = >>~15s). > > Is there some way to avoid this? > Well, in general, 3ms for a single lookup seems really long. Maybe your index is bloated by not vacuuming often enough. Do you tend to get a lot of updates to litigant_details? Given that the average access time for a 15Krpm HD is in the 5.5-6ms range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case) table accesses is requiring a seek. This implies a poor match between physical layout and access pattern. If I understand correctly, the table should not be very fragmented given that this is a reasonably freshly loaded DB? That implies that the fields being looked up are not well sorted in the table compared to the query pattern. If the entire table could fit in RAM, this would be far less of a consideration. Failing that, the physical HD layout has to be improved or the query pattern has to be changed to reduce seeks. There are a couple possibilities at this point. First, you can REINDEX the appropriate index, and see if that helps. However, if this is a test box, it sounds like you just did a dump and reload, which wouldn't have bloat in an index. Another possibility. Is this the column that you usually use when pulling information out of litigant_details? If so, you can CLUSTER litigant_details on the appropriate index. This will help things be close together that should be, which decreases the index lookup costs. However, if this is not the common column, then you probably will slow down whatever other accesses you may have on this table. After CLUSTER, the current data will stay clustered, but new data will not, so you have to continually CLUSTER, the same way that you might VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as expensive as a VACUUM FULL. Be aware of this, but it might vastly improve your performance, so it would be worth it. CLUSTER can be a very large maintenance overhead/problem if the table(s) in question actually need to be "continually" re CLUSTER ed. If there is no better solution available, then you do what you have to, but it feels like there should be a better answer here. Perhaps the DB schema needs examining to see if it matches up well with its real usage? Ron Peacetree ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: > Sorry about the formatting. > > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote: > >>Jeremiah Jahn wrote: >> >> ... >>The expensive parts are the 4915 lookups into the litigant_details (each >>one takes approx 4ms for a total of ~20s). >>And then you do it again on case_data (average 3ms each * 4906 loops = >>~15s). > > Is there some way to avoid this? > Well, in general, 3ms for a single lookup seems really long. Maybe your index is bloated by not vacuuming often enough. Do you tend to get a lot of updates to litigant_details? There are a couple possibilities at this point. First, you can REINDEX the appropriate index, and see if that helps. However, if this is a test box, it sounds like you just did a dump and reload, which wouldn't have bloat in an index. Another possibility. Is this the column that you usually use when pulling information out of litigant_details? If so, you can CLUSTER litigant_details on the appropriate index. This will help things be close together that should be, which decreases the index lookup costs. However, if this is not the common column, then you probably will slow down whatever other accesses you may have on this table. After CLUSTER, the current data will stay clustered, but new data will not, so you have to continually CLUSTER, the same way that you might VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as expensive as a VACUUM FULL. Be aware of this, but it might vastly improve your performance, so it would be worth it. > > >>So there is no need for preloading your indexes on the identity table. >>It is definitely not the bottleneck. >> >>So a few design bits, which may help your database. >>Why is "actor_id" a text field instead of a number? > > This is simply due to the nature of the data. > I'm just wondering if changing into a number, and using a number->name lookup would be faster for you. It may not be. In general, I prefer to use numbers for references. I may be over paranoid, but I know that some locales are bad with string -> string comparisons. And since the data in your database is stored as UNICODE, I'm not sure if it has to do any translating or not. Again, something to consider, it may not make any difference. > >>You could try creating an index on "litigant_details (actor_id, >>count_ori)" so that it can do just an index lookup, rather than an index >>+ filter. > > I have one, but it doesn't seem to like to use it. Don't really need it > though, I can just drop the court_id out of the query. It's redundant, > since each actor_id is also unique in litigant details. I had run vac > full and analyze but I ran them again anyway and the planning improved. > However, my 14 disk raid 10 array is still slower than my 3 disk raid 5 > on my production box. 46sec vs 30sec (with live traffic on the > production) One of the strange things is that when I run the cat command > on my index and tables that are "HOT" it has no effect on memory usage. > Right now I'm running ext3 on LVM. I'm still in a position to redo the > file system and everything. Is this a good way to do it or should I > switch to something else? What about stripe and extent sizes...? kernel > parameters to change? Well, the plans are virtually identical. There is one small difference as to whether it joins against case_data or court first. But 'court' is very tiny (small enough to use a seqscan instead of index scan) I'm a little surprised with court being this small that it doesn't do something like a hash aggregation, but court takes no time anyway. The real problem is that your nested loop index time is *much* slower. Devel: -> Index Scan using lit_actor_speed on litigant_details (cost=0.00..3.96 rows=1 width=81) (actual time=4.788..4.812 rows=1 loops=5057) Production: -> Index Scan using lit_actor_speed on litigant_details (cost=0.00..5.63 rows=1 width=81) (actual time=3.355..3.364 rows=1 loops=5057) Devel: -> Index Scan using case_speed on case_data (cost=0.00..3.46 rows=1 width=26) (actual time=4.222..4.230 rows=1 loops=5052) Production: -> Index Scan using case_data_pkey on case_data (cost=0.00..5.31 rows=1 width=26) (actual time=1.897..1.904 rows=1 loops=5052) Notice that the actual per-row cost is as much as 1/2 less than on your devel box. As a test, can you do "time cat $index_file >/dev/null" a couple of times. And then determine the MB/s. Alternatively run vmstat in another shell. If the read/s doesn't change, then you know the "cat" is being served from RAM, and thus it really is cached. I can point you to REINDEX and CLUSTER, but if it is caching in ram, I honestly can't say why the per loop would be that much slower. Are both systems running the same postgres version? It sounds like it is different (since you say something about switching to 8.0). I doubt it, but you might try an 8.1devel version. ... >>Do you regularly vacuum analyze your tables?
Re: [PERFORM] extremly low memory usage
Sorry about the formatting. On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote: > Jeremiah Jahn wrote: > > >here's an example standard query. Ireally have to make the first hit go > >faster. The table is clustered as well on full_name as well. 'Smith%' > >took 87 seconds on the first hit. I wonder if I set up may array wrong. > >I remeber see something about DMA access versus something else, and > >choose DMA access. LVM maybe? > > > > > It would be nice if you would format your queries to be a little bit > easier to read before posting them. > However, I believe I am reading it correctly, to say that the index scan > on identity is not your slow point. In fact, as near as I can tell, it > only takes 52ms to complete. > > The expensive parts are the 4915 lookups into the litigant_details (each > one takes approx 4ms for a total of ~20s). > And then you do it again on case_data (average 3ms each * 4906 loops = > ~15s). Is there some way to avoid this? > > So there is no need for preloading your indexes on the identity table. > It is definitely not the bottleneck. > > So a few design bits, which may help your database. > Why is "actor_id" a text field instead of a number? This is simply due to the nature of the data. > You could try creating an index on "litigant_details (actor_id, > count_ori)" so that it can do just an index lookup, rather than an index > + filter. I have one, but it doesn't seem to like to use it. Don't really need it though, I can just drop the court_id out of the query. It's redundant, since each actor_id is also unique in litigant details. I had run vac full and analyze but I ran them again anyway and the planning improved. However, my 14 disk raid 10 array is still slower than my 3 disk raid 5 on my production box. 46sec vs 30sec (with live traffic on the production) One of the strange things is that when I run the cat command on my index and tables that are "HOT" it has no effect on memory usage. Right now I'm running ext3 on LVM. I'm still in a position to redo the file system and everything. Is this a good way to do it or should I switch to something else? What about stripe and extent sizes...? kernel parameters to change? ---devel box:--- copa=# EXPLAIN ANALYZE select full_name,identity_id,identity.case_id,court.id,date_of_birth,assigned_case_role,litigant_details.impound_litigant_data copa-# from identity copa-# join litigant_details on identity.actor_id = litigant_details.actor_id copa-# join case_data on litigant_details.case_id = case_data.case_id and litigant_details.court_ori = case_data.court_ori copa-# join court on identity.court_ori = court.id copa-# where identity.court_ori = 'IL081025J' and full_name like 'JONES%' order by full_name; QUERY PLAN Nested Loop (cost=3.29..29482.22 rows=3930 width=86) (actual time=114.060..46001.480 rows=5052 loops=1) -> Nested Loop (cost=3.29..16193.27 rows=3820 width=112) (actual time=93.038..24584.275 rows=5052 loops=1) -> Nested Loop (cost=0.00..16113.58 rows=3820 width=113) (actual time=85.778..24536.489 rows=5052 loops=1) -> Index Scan using name_speed on identity (cost=0.00..824.72 rows=3849 width=82) (actual time=50.284..150.133 rows=5057 loops=1) Index Cond: (((full_name)::text >= 'JONES'::character varying) AND ((full_name)::text < 'JONET'::character varying)) Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'JONES%'::text)) -> Index Scan using lit_actor_speed on litigant_details (cost=0.00..3.96 rows=1 width=81) (actual time=4.788..4.812 rows=1 loops=5057) Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text) -> Materialize (cost=3.29..3.30 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=5052) -> Seq Scan on court (cost=0.00..3.29 rows=1 width=12) (actual time=7.248..7.257 rows=1 loops=1) Filter: ('IL081025J'::text = (id)::text) -> Index Scan using case_speed on case_data (cost=0.00..3.46 rows=1 width=26) (actual time=4.222..4.230 rows=1 loops=5052) Index Cond: ((("outer".court_ori)::text = (case_data.court_ori)::text) AND (("outer".case_id)::text = (case_data.case_id)::text)) Total runtime: 46005.994 ms > > More importantly, though, the planner seems to think the join of > identity to litigant_details will only return 1 row, not 5000. > Do you regularly vacuum analyze your tables? > Just as a test, try running: > set enable_nested_loop to off; not quite acceptable Total runtime: 221486.149 ms > And then run EXPLAIN ANALYZE again, just to see if it is faster. > > You probably need to increase some sta
Re: [PERFORM] extremly low memory usage
At 01:55 PM 8/18/2005, John Arbash Meinel wrote: Jeremiah Jahn wrote: >here's an example standard query. Ireally have to make the first hit go >faster. The table is clustered as well on full_name as well. 'Smith%' >took 87 seconds on the first hit. I wonder if I set up may array wrong. >I remeber see something about DMA access versus something else, and >choose DMA access. LVM maybe? > > It would be nice if you would format your queries to be a little bit easier to read before posting them. However, I believe I am reading it correctly, to say that the index scan on identity is not your slow point. In fact, as near as I can tell, it only takes 52ms to complete. The expensive parts are the 4915 lookups into the litigant_details (each one takes approx 4ms for a total of ~20s). And then you do it again on case_data (average 3ms each * 4906 loops = ~15s). How big are litigant_details and case_data? If they can fit in RAM, preload them using methods like the "cat to /dev/null" trick and those table lookups will be ~100-1000x faster. If they won't fit into RAM but the machine can be expanded to hold enough RAM to fit the tables, it's well worth the ~$75-$150/GB to upgrade the server so that the tables will fit into RAM. If they can't be made to fit into RAM as atomic entities, you have a few choices: A= Put the data tables and indexes on separate dedicated spindles and put litigant_details and case_data each on their own dedicated spindles. This will lower seek conflicts. Again it this requires buying some more HDs, it's well worth it. B= Break litigant_details and case_data into a set of smaller tables (based on something sane like the first n characters of the primary key) such that the smaller tables easily fit into RAM. Given that you've said only 10GB/60GB is "hot", this could work very well. Combine it with "A" above (put all the litigant_details sub tables on one dedicated spindle set and all the case_data sub tables on another spindle set) for added oomph. C= Buy a SSD big enough to hold litigant_details and case_data and put them there. Again, this can be combined with "A" and "B" above to lessen the size of the SSD needed. So there is no need for preloading your indexes on the identity table. It is definitely not the bottleneck. So a few design bits, which may help your database. Why is "actor_id" a text field instead of a number? You could try creating an index on "litigant_details (actor_id, count_ori)" so that it can do just an index lookup, rather than an index+ filter. Yes, that certainly sounds like it would be more efficient. More importantly, though, the planner seems to think the join of identity to litigant_details will only return 1 row, not 5000. Do you regularly vacuum analyze your tables? Just as a test, try running: set enable_nested_loop to off; And then run EXPLAIN ANALYZE again, just to see if it is faster. You probably need to increase some statistics targets, so that the planner can design better plans. > -> Nested Loop (cost=0.00..20411.83 rows=2 width=173) (actual time=12.891..38317.017 rows=4906 loops=1) >-> Nested Loop (cost=0.00..20406.48 rows=1 width=159)(actual time=12.826..23232.106 rows=4906 loops=1) > -> Nested Loop (cost=0.00..20403.18 rows=1 width=138) (actual time=12.751..22885.439 rows=4906 loops=1) > Join Filter: (("outer".case_id)::text = ("inner".case_id)::text) > -> Index Scan using name_speed on identity (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538 > rows=4915 loops=1) >Index Cond: (((full_name)::text >= 'MILLER'::character varying) AND ((full_name)::text < 'MILLES'::character varying)) >Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text)) >-> Index Scan using lit_actor_speed on litigant_details (cost=0.00..3.96 rows=1 width=81) (actual > time=4.631..4.635 rows=1 loops=4915) > Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text) > Filter: ('IL081025J'::text = (court_ori)::text) > -> Seq Scan on court (cost=0.00..3.29 rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906) > Filter: ('IL081025J'::text = (id)::text) > -> Index Scan using case_speed on case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058 > rows=1 loops=4906) > Index Cond: (('IL081025J'::text = (case_data.court_ori)::text) AND ((case_data.case_id)::text = > ("outer".case_id)::text)) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: >here's an example standard query. Ireally have to make the first hit go >faster. The table is clustered as well on full_name as well. 'Smith%' >took 87 seconds on the first hit. I wonder if I set up may array wrong. >I remeber see something about DMA access versus something else, and >choose DMA access. LVM maybe? > > It would be nice if you would format your queries to be a little bit easier to read before posting them. However, I believe I am reading it correctly, to say that the index scan on identity is not your slow point. In fact, as near as I can tell, it only takes 52ms to complete. The expensive parts are the 4915 lookups into the litigant_details (each one takes approx 4ms for a total of ~20s). And then you do it again on case_data (average 3ms each * 4906 loops = ~15s). So there is no need for preloading your indexes on the identity table. It is definitely not the bottleneck. So a few design bits, which may help your database. Why is "actor_id" a text field instead of a number? You could try creating an index on "litigant_details (actor_id, count_ori)" so that it can do just an index lookup, rather than an index + filter. More importantly, though, the planner seems to think the join of identity to litigant_details will only return 1 row, not 5000. Do you regularly vacuum analyze your tables? Just as a test, try running: set enable_nested_loop to off; And then run EXPLAIN ANALYZE again, just to see if it is faster. You probably need to increase some statistics targets, so that the planner can design better plans. > -> Nested Loop (cost=0.00..20411.83 rows=2 width=173) > (actual time=12.891..38317.017 rows=4906 loops=1) >-> Nested Loop (cost=0.00..20406.48 rows=1 width=159) > (actual time=12.826..23232.106 rows=4906 loops=1) > -> Nested Loop (cost=0.00..20403.18 rows=1 > width=138) (actual time=12.751..22885.439 rows=4906 loops=1) >Join Filter: (("outer".case_id)::text = > ("inner".case_id)::text) >-> Index Scan using name_speed on > identity (cost=0.00..1042.34 rows=4868 width=82) (actual > time=0.142..52.538 rows=4915 loops=1) > Index Cond: (((full_name)::text >= > 'MILLER'::character varying) AND ((full_name)::text < > 'MILLES'::character varying)) > Filter: (((court_ori)::text = > 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text)) >-> Index Scan using lit_actor_speed on > litigant_details (cost=0.00..3.96 rows=1 width=81) (actual > time=4.631..4.635 rows=1 loops=4915) > Index Cond: (("outer".actor_id)::text > = (litigant_details.actor_id)::text) > Filter: ('IL081025J'::text = > (court_ori)::text) > -> Seq Scan on court (cost=0.00..3.29 rows=1 > width=33) (actual time=0.053..0.062 rows=1 loops=4906) >Filter: ('IL081025J'::text = (id)::text) >-> Index Scan using case_speed on case_data > (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058 rows=1 > loops=4906) > Index Cond: (('IL081025J'::text = > (case_data.court_ori)::text) AND ((case_data.case_id)::text = > ("outer".case_id)::text)) John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] extremly low memory usage
here's an example standard query. Ireally have to make the first hit go faster. The table is clustered as well on full_name as well. 'Smith%' took 87 seconds on the first hit. I wonder if I set up may array wrong. I remeber see something about DMA access versus something else, and choose DMA access. LVM maybe? explain analyze select distinct case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data, to_number(trim(leading case_data.type_code from trim(leading case_data.case_year from case_data.case_id)),'99') as seq from identity,court,litigant_details,case_data where identity.court_ori = litigant_details.court_ori and identity.case_id = litigant_details.case_id and identity.actor_id = litigant_details.actor_id and court.id = identity.court_ori and identity.court_ori = case_data.court_ori and case_data.case_id = identity.case_id and identity.court_ori = 'IL081025J' and full_name like 'MILLER%' order by full_name; QUERY PLAN --- Unique (cost=20411.84..20411.91 rows=2 width=173) (actual time=38340.231..38355.120 rows=4906 loops=1) -> Sort (cost=20411.84..20411.84 rows=2 width=173) (actual time=38340.227..38343.667 rows=4906 loops=1) Sort Key: identity.full_name, case_data.case_category, identity.identity_id, court.name, litigant_details.case_id, case_data.case_year, identity.date_of_birth, litigant_details.assigned_case_role, litigant_details.court_ori, litigant_details.actor_id, case_data.type_code, case_data.subtype_code, litigant_details.impound_litigant_data, to_number(ltrim(ltrim((case_data.case_id)::text, (case_data.case_year)::text), (case_data.type_code)::text), '99'::text) -> Nested Loop (cost=0.00..20411.83 rows=2 width=173) (actual time=12.891..38317.017 rows=4906 loops=1) -> Nested Loop (cost=0.00..20406.48 rows=1 width=159) (actual time=12.826..23232.106 rows=4906 loops=1) -> Nested Loop (cost=0.00..20403.18 rows=1 width=138) (actual time=12.751..22885.439 rows=4906 loops=1) Join Filter: (("outer".case_id)::text = ("inner".case_id)::text) -> Index Scan using name_speed on identity (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538 rows=4915 loops=1) Index Cond: (((full_name)::text >= 'MILLER'::character varying) AND ((full_name)::text < 'MILLES'::character varying)) Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text)) -> Index Scan using lit_actor_speed on litigant_details (cost=0.00..3.96 rows=1 width=81) (actual time=4.631..4.635 rows=1 loops=4915) Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text) Filter: ('IL081025J'::text = (court_ori)::text) -> Seq Scan on court (cost=0.00..3.29 rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906) Filter: ('IL081025J'::text = (id)::text) -> Index Scan using case_speed on case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058 rows=1 loops=4906) Index Cond: (('IL081025J'::text = (case_data.court_ori)::text) AND ((case_data.case_id)::text = ("outer".case_id)::text)) Total runtime: 38359.722 ms (18 rows) copa=> explain analyze select distinct case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data, to_number(trim(leading case_data.type_code from trim(leading case_data.case_year from case_data.case_id)),'99') as seq from identity,court,litigant_details,case_data where identity.court_ori = litigant_details.court_ori and identity.case_id = litigant_details.case_id and identity.actor_id = litigant_details.actor_id and court.id = identity.court_ori and identity.court_ori = c
Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: >On Wed, 2005-08-17 at 21:21 -0500, John A Meinel wrote: > > >>Jeremiah Jahn wrote: >> >> >>>I just put together a system with 6GB of ram on a 14 disk raid 10 array. >>>When I run my usual big painful queries, I get very little to know >>>memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used >>>most of the time. the new devel box sits at around 250MB. >>> >>>I've switched to an 8.0 system on the new devel box, but the .conf >>>really didn't change. Index usage is the same. Something seems wrong and >>>I'm not sure why. >>> >>> >>> >>How big is your actual database on disk? And how much of it is actually >>touched by your queries? >> >> >The DB is about 60GB. About 10GB is actually used in real queries, >versus get me this single record with this ID. I have a large query that >finds court cases based on certain criteria that is name based. I get a >full seq scan on the name table in about 7 seconds, This table has about >6 million names (most being 'smith, something'). The index scan takes >much less time of course, once it's been cached (somewhere but not >apparently memory). The really query can take 60 seconds on a first run. >And 1.3 seconds on a second run. I'm very happy with the cached results, >just not really sure where that caching is happening since it doesn't >show up as memory usage. I do know that the caching that happens seems >to be independent of the DB. I can restart the DB and my speeds are >still the same as the cached second query. Is there some way to >pre-cache some of the tables/files on the file system? If I switch my >query to search for 'jones%' instead of 'smith%', I take a hit. But if I >then rerun the smith search, I still get cached speed. I only have two >tables essentially names and events that have to do any real work ie. >not very atomic data. I'd love to be able to force these two tables into >a cache somewhere. This is a linux system (RHEL ES4) by the way. > > I think what is happening is that *some* of the index pages are being cached, just not all of them. Most indexes (if you didn't specify anything special) are btree, so that you load the root page, and then determine what pages need to be loaded from there. So the "jones%" pages aren't anywhere near the "smith%" pages. And don't need to be loaded if you aren't accessing them. So the required memory usage might be smaller than you think. At least until all of the index pages have been accessed. The reason it is DB independent is because the OS is caching a file access (you read a file, it keeps the old pages in RAM in case you ask for it again). Part of the trick, is that as you use the database, it will cache what has been used. So you may not need to do anything. It should sort itself out with time. However, if you have to have cached performance as soon as your machine reboots, you could figure out what files on disk represent your indexes and tables, and then just "cat $files >/dev/null" That should cause a read on those files, which should pull them into the memory cache. *However* this will fail if the size of those files is greater than available memory, so you may want to be a little bit stingy about what you preload. Alternatively, you could just write an SQL script which runs a bunch of indexed queries to make sure all the pages get loaded. Something like: FOR curname IN SELECT DISTINCT name FROM users LOOP SELECT name FROM users WHERE name=curname; END LOOP; That should make the database go through the entire table, and load the index for every user. This is overkill, and will probably take a long time to execute. But you could do it if you wanted. >>It seems that your tough queries might only be exercising a portion of >>the database. If you really want to make memory usage increase try >>something like: >>find . -type f -print0 | xargs -0 cat >/dev/null >>Which should read all the files. After doing that, does the memory usage >>increase? >> >> >> >>>any thoughts, >>>-jj- >>> >>> >>>shared_buffers = 32768 # min 16, at least max_connections*2, 8KB >>>each >>>work_mem = 2097151 # min 64, size in KB >>> >>> >>This seems awfully high. 2GB Per sort? This might actually be flushing >>some of your ram, since it would get allocated and filled, and then >>freed when finished. Remember, depending on what you are doing, this >>amount can get allocated more than once per query. >> >> >What's a good way to determine the optimal size? > > Practice. :) A few questions I guess... How many concurrent connections are you expecting? How many joins does a standard query have? How big are the joins? In general, I would tend to make this a smaller number, so that the os has more room to cache tables, rather than having big buffers for joins. If someone is requesting a join that requires a lot of rows, I would rather *that* query be slower, than impacting everyone else. I would put it more with a maximum in the 20-100MB ra
Re: [PERFORM] extremly low memory usage
On Wed, 2005-08-17 at 21:21 -0500, John A Meinel wrote: > Jeremiah Jahn wrote: > > I just put together a system with 6GB of ram on a 14 disk raid 10 array. > > When I run my usual big painful queries, I get very little to know > > memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used > > most of the time. the new devel box sits at around 250MB. > > > > I've switched to an 8.0 system on the new devel box, but the .conf > > really didn't change. Index usage is the same. Something seems wrong and > > I'm not sure why. > > > > How big is your actual database on disk? And how much of it is actually > touched by your queries? The DB is about 60GB. About 10GB is actually used in real queries, versus get me this single record with this ID. I have a large query that finds court cases based on certain criteria that is name based. I get a full seq scan on the name table in about 7 seconds, This table has about 6 million names (most being 'smith, something'). The index scan takes much less time of course, once it's been cached (somewhere but not apparently memory). The really query can take 60 seconds on a first run. And 1.3 seconds on a second run. I'm very happy with the cached results, just not really sure where that caching is happening since it doesn't show up as memory usage. I do know that the caching that happens seems to be independent of the DB. I can restart the DB and my speeds are still the same as the cached second query. Is there some way to pre-cache some of the tables/files on the file system? If I switch my query to search for 'jones%' instead of 'smith%', I take a hit. But if I then rerun the smith search, I still get cached speed. I only have two tables essentially names and events that have to do any real work ie. not very atomic data. I'd love to be able to force these two tables into a cache somewhere. This is a linux system (RHEL ES4) by the way. > > It seems that your tough queries might only be exercising a portion of > the database. If you really want to make memory usage increase try > something like: > find . -type f -print0 | xargs -0 cat >/dev/null > Which should read all the files. After doing that, does the memory usage > increase? > > > > > any thoughts, > > -jj- > > > > > > shared_buffers = 32768 # min 16, at least max_connections*2, 8KB > > each > > work_mem = 2097151 # min 64, size in KB > > This seems awfully high. 2GB Per sort? This might actually be flushing > some of your ram, since it would get allocated and filled, and then > freed when finished. Remember, depending on what you are doing, this > amount can get allocated more than once per query. What's a good way to determine the optimal size? > > > maintenance_work_mem = 819200 # min 1024, size in KB > > max_fsm_pages = 8 # min max_fsm_relations*16, 6 bytes each > > checkpoint_segments = 30# in logfile segments, min 1, 16MB each > > effective_cache_size = 360 <-this is a little out of control, but > > would it have any real effect? > > It should just tell the planner that it is more likely to have buffers > in cache, so index scans are slightly cheaper than they would otherwise be. > > > random_page_cost = 2# units are one sequential page fetch cost > > log_min_duration_statement = 1 # -1 is disabled, in milliseconds. > > lc_messages = 'C' # locale for system error message strings > > lc_monetary = 'C' # locale for monetary formatting > > lc_numeric = 'C'# locale for number formatting > > lc_time = 'C' # locale for time formatting > > > > John > =:-> -- "Now this is a totally brain damaged algorithm. Gag me with a smurfette." -- P. Buhr, Computer Science 354 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] extremly low memory usage
On Aug 17, 2005, at 10:11 PM, Jeremiah Jahn wrote: I just put together a system with 6GB of ram on a 14 disk raid 10 array. When I run my usual big painful queries, I get very little to know memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used most of the time. the new devel box sits at around 250MB. Is the system performing fine? Are you touching as much data as the production box? If the system is performing fine don't worry about it. work_mem = 2097151 # min 64, size in KB This is EXTREMELY high. You realize this is the amount of memory that can be used per-sort and per-hash build in a query? You can end up with multiples of this on a single query. If you have some big queries that are run infrequently have them set it manually. effective_cache_size = 360 <-this is a little out of control, but would it have any real effect? This doesn't allocate anything - it is a hint to the planner about how much data it can assume is cached. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: > I just put together a system with 6GB of ram on a 14 disk raid 10 array. > When I run my usual big painful queries, I get very little to know > memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used > most of the time. the new devel box sits at around 250MB. > > I've switched to an 8.0 system on the new devel box, but the .conf > really didn't change. Index usage is the same. Something seems wrong and > I'm not sure why. > How big is your actual database on disk? And how much of it is actually touched by your queries? It seems that your tough queries might only be exercising a portion of the database. If you really want to make memory usage increase try something like: find . -type f -print0 | xargs -0 cat >/dev/null Which should read all the files. After doing that, does the memory usage increase? > > any thoughts, > -jj- > > > shared_buffers = 32768 # min 16, at least max_connections*2, 8KB each > work_mem = 2097151 # min 64, size in KB This seems awfully high. 2GB Per sort? This might actually be flushing some of your ram, since it would get allocated and filled, and then freed when finished. Remember, depending on what you are doing, this amount can get allocated more than once per query. > maintenance_work_mem = 819200 # min 1024, size in KB > max_fsm_pages = 8 # min max_fsm_relations*16, 6 bytes each > checkpoint_segments = 30# in logfile segments, min 1, 16MB each > effective_cache_size = 360 <-this is a little out of control, but > would it have any real effect? It should just tell the planner that it is more likely to have buffers in cache, so index scans are slightly cheaper than they would otherwise be. > random_page_cost = 2# units are one sequential page fetch cost > log_min_duration_statement = 1 # -1 is disabled, in milliseconds. > lc_messages = 'C' # locale for system error message strings > lc_monetary = 'C' # locale for monetary formatting > lc_numeric = 'C'# locale for number formatting > lc_time = 'C' # locale for time formatting > John =:-> signature.asc Description: OpenPGP digital signature
[PERFORM] extremly low memory usage
I just put together a system with 6GB of ram on a 14 disk raid 10 array. When I run my usual big painful queries, I get very little to know memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used most of the time. the new devel box sits at around 250MB. I've switched to an 8.0 system on the new devel box, but the .conf really didn't change. Index usage is the same. Something seems wrong and I'm not sure why. any thoughts, -jj- shared_buffers = 32768 # min 16, at least max_connections*2, 8KB each work_mem = 2097151 # min 64, size in KB maintenance_work_mem = 819200 # min 1024, size in KB max_fsm_pages = 8 # min max_fsm_relations*16, 6 bytes each checkpoint_segments = 30# in logfile segments, min 1, 16MB each effective_cache_size = 360 <-this is a little out of control, but would it have any real effect? random_page_cost = 2# units are one sequential page fetch cost log_min_duration_statement = 1 # -1 is disabled, in milliseconds. lc_messages = 'C' # locale for system error message strings lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C'# locale for number formatting lc_time = 'C' # locale for time formatting -- "Now this is a totally brain damaged algorithm. Gag me with a smurfette." -- P. Buhr, Computer Science 354 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match