subject:"\[PERFORM\] extremly low memory usage"

Re: [PERFORM] extremly low memory usage

2005-08-23 Thread John A Meinel

Jeremiah Jahn wrote:
> On Sun, 2005-08-21 at 16:13 -0400, Ron wrote:
>
>>At 10:54 AM 8/21/2005, Jeremiah Jahn wrote:
>>

...

>>So you have 2 controllers each with 2 external slots?  But you are
>>currently only using 1 controller and only one external slot on that
>>controller?
>
>
> Sorry, no. I have one dual channel card in the system and two
> controllers on the array. Dell PowerVault 220S w/ PERC4eDC-PCI Express
>
>

...

>>Regardless of the above, each of these controllers should still be
>>good for about 80-85% of 640MB/s, or ~510-540 MB/s apiece when doing
>>raw sequential IO if you plug 3-4 fast enough HD's into each SCSI
>>channel.  Cheetah 15K.4's certainly are fast enough.  Optimal setup
>>is probably to split each RAID 1 pair so that one HD is on each of
>>the SCSI channels, and then RAID 0 those pairs.  That will also
>>protect you from losing the entire disk subsystem if one of the SCSI
>>channels dies.
>
> I like this idea, but how exactly does one bond the two channels
> together? Won't this cause me to have both an /dev/sdb and an /dev/sdc?
>

Well, even if you did, you could always either use software raid, or lvm
to turn it into a single volume.

It also depends what the controller card bios would let you get away
with. Some cards would let you setup 4 RAID1's (one drive from each
channel), and then create a RAID0 of those pairs. Software raid should
do this without any problem. And can even be done such that it can be
grown in the future, as well as work across multiple cards (though the
latter is supported by some cards as well).

>
>
>>That 128MB of buffer cache may very well be too small to keep the IO
>>rate up, and/or there may be a more subtle problem with the LSI card,
>>and/or you may have a configuration problem, but _something(s)_ need
>>fixing since you are only getting raw sequential IO of ~100-150MB/s
>>when it should be above 500MB/s.
>
>
> It looks like there's a way to add more memory to it.

This memory probably helps more in writing than reading. If you are
reading the same area over and over, it might end up being a little bit
of extra cache for that (but it should already be cached in system RAM,
so you don't really get anything).

...

>>Initial reads are only going to be as fast as your HD subsystem, so
>>there's a reason for making the HD subsystem faster even if all you
>>care about is reads.  In addition, I'll repeat my previous advice
>>that upgrading to 16GB of RAM would be well worth it for you.
>
>
> 12GB is my max. I may run with it for a while and see.

If your working set truly is 10GB, then you can get a massive
performance increase even at 12GB. If your working set is 10GB and you
have 6GB of RAM, it probably is always swapping out what it just read
for the new stuff, even though you will read that same thing again in a
few seconds. So rather than just paying for the 4GB that can't be
cached, you pay for the whole 10.

John
=:->

>
>
>>Hope this helps,
>>Ron Peacetree
>>
>>
>>
>>---(end of broadcast)---
>>TIP 9: In versions below 8.0, the planner will ignore your desire to
>>   choose an index scan if your joining column's datatypes do not
>>   match

signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] extremly low memory usage

2005-08-22 Thread William Yu

Ron wrote:

PERC4eDC-PCI Express, 128MB Cache, 2-External Channels

Looks like they are using the LSI Logic MegaRAID SCSI 320-2E 
controller.  IIUC, you have 2 of these, each with 2 external channels?

A lot of people have mentioned Dell's versions of the LSI cards can be 
WAY slower than the ones you buy from LSI. Why this is the case? Nobody 
knows for sure.

Here's a guess on my part. A while back, I was doing some googling. And 
instead of typing "LSI MegaRAID xxx", I just typed "MegaRAID xxx". Going 
beyond the initial pages, I saw Tekram -- a company that supposedly 
produces their own controllers -- listing products with the exact model 
numbers and photos as cards from LSI and Areca. Seemed puzzling until I 
read a review about SATA RAID cards where it mentioned Tekram produces 
the Areca cards under their own name but using slower components to 
avoid competing at the highend with them.

So what may be happening is that the logic circuitry on the Dell PERCs 
are the same as the source LSI cards, the speed of the RAID 
processor/RAM/internal buffers/etc is not as fast so Dell can shave off 
a few bucks for every server. That would mean while a true LSI card has 
the processing power to do the RAID calculates for X drives, the Dell 
version probably can only do X*0.6 drives or so.

The 128MB buffer also looks suspiciously small, and I do not see any 
upgrade path for it on LSI Logic's site.  "Serious" RAID controllers 
from companies like Xyratex, Engino, and Dot-hill can have up to 1-2GB 

The card is upgradable. If you look at the pic of the card, it shows a 
SDRAM DIMM versus integrated RAM chips. I've also read reviews a while 
back comparing benchmarks of the 320-2 w/ 128K versus 512K onboard RAM. 
Their product literature is just nebulous on the RAM upgrade part. I'm 
sure if you opened up the PDF manuals, you could find the exact info

That 128MB of buffer cache may very well be too small to keep the IO 
rate up, and/or there may be a more subtle problem with the LSI card, 
and/or you may have a configuration problem, but _something(s)_ need 
fixing since you are only getting raw sequential IO of ~100-150MB/s when 
it should be above 500MB/s.

I think it just might be the Dell hardware or the lack of 64-bit IOMMU 
on Xeon's. Here's my numbers on 320-1 w/ 128K paired up with Opterons 
compared to Jeremiah's.

>> # time dd if=/dev/zero of=testfile bs=1024 count=100
>> 100+0 records in
>> 100+0 records out
>>
>> real0m8.885s
>> user0m0.299s
>> sys 0m6.998s

2x15K RAID1
real0m14.493s
user0m0.255s
sys 0m11.712s

6x15K RAID10 (2x 320-1)
real0m9.986s
user0m0.200s
sys 0m8.634s

>> # time dd of=/dev/null if=testfile bs=1024 count=100
>> 100+0 records in
>> 100+0 records out
>>
>> real0m1.654s
>> user0m0.232s
>> sys 0m1.415s

2x15K RAID1
real0m3.383s
user0m0.176s
sys 0m3.207s

6x15K RAID10 (2x 320-1)
real0m2.427s
user0m0.178s
sys 0m2.250s

If all 14 HDs are arranged in a RAID10 array, I'd say there's definitely 
something wrong with Jeremiah's hardware.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] extremly low memory usage

2005-08-22 Thread Jeremiah Jahn

On Sun, 2005-08-21 at 16:13 -0400, Ron wrote:
> At 10:54 AM 8/21/2005, Jeremiah Jahn wrote:
> >On Sat, 2005-08-20 at 21:32 -0500, John A Meinel wrote:
> > > Ron wrote:
> > >
> > > Well, since you can get a read of the RAID at 150MB/s, that means that
> > > it is actual I/O speed. It may not be cached in RAM. Perhaps you could
> > > try the same test, only using say 1G, which should be cached.
> >
> >[EMAIL PROTECTED] pgsql]# time dd if=/dev/zero of=testfile bs=1024 
> >count=100
> >100+0 records in
> >100+0 records out
> >
> >real0m8.885s
> >user0m0.299s
> >sys 0m6.998s
> 
> This is abysmally slow.
> 
> 
> >[EMAIL PROTECTED] pgsql]# time dd of=/dev/null if=testfile bs=1024 
> >count=100
> >100+0 records in
> >100+0 records out
> >
> >real0m1.654s
> >user0m0.232s
> >sys 0m1.415s
> 
> This transfer rate is the only one out of the 4 you have posted that 
> is in the vicinity of where it should be.
> 
> 
> >The raid array I have is currently set up to use a single channel. But I
> >have dual controllers in the array. And dual external slots on the card.
> >The machine is brand new and has pci-e backplane.
> >
> So you have 2 controllers each with 2 external slots?  But you are 
> currently only using 1 controller and only one external slot on that 
> controller?

Sorry, no. I have one dual channel card in the system and two
controllers on the array. Dell PowerVault 220S w/ PERC4eDC-PCI Express

> 
> 
> > > > Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them
> > > > doing raw sequential IO like this should be capable of at
> > > >  ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's
> >BTW I'm using Seagate Cheetah 15K.4's
> 
> OK, now we have that nailed down.
> 
> 
> > > > AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic
> > > > MegaRAID controllers.  What I don't know is which exact one yours is,
> > > > nor do I know if it (or any of the MegaRAID controllers) are high
> > > > powered enough.
> >
> >PERC4eDC-PCI Express, 128MB Cache, 2-External Channels
> 
> Looks like they are using the LSI Logic MegaRAID SCSI 320-2E 
> controller.  IIUC, you have 2 of these, each with 2 external channels?
> 
> The specs on these appear a bit strange.  They are listed as being a 
> PCI-Ex8 card, which means they should have a max bandwidth of 20Gb/s= 
> 2GB/s, yet they are also listed as only supporting dual channel U320= 
> 640MB/s when they could easily support quad channel U320= 
> 1.28GB/s.  Why bother building a PCI-Ex8 card when only a PCI-Ex4 
> card (which is a more standard physical format) would've been 
> enough?  Or if you are going to build a PCI-Ex8 card, why not support 
> quad channel U320?  This smells like there's a problem with LSI's design.
> 
> The 128MB buffer also looks suspiciously small, and I do not see any 
> upgrade path for it on LSI Logic's site.  "Serious" RAID controllers 
> from companies like Xyratex, Engino, and Dot-hill can have up to 
> 1-2GB of buffer, and there's sound technical reasons for it.  See if 
> there's a buffer upgrade available or if you can get controllers that 
> have larger buffer capabilities.
> 
> Regardless of the above, each of these controllers should still be 
> good for about 80-85% of 640MB/s, or ~510-540 MB/s apiece when doing 
> raw sequential IO if you plug 3-4 fast enough HD's into each SCSI 
> channel.  Cheetah 15K.4's certainly are fast enough.  Optimal setup 
> is probably to split each RAID 1 pair so that one HD is on each of 
> the SCSI channels, and then RAID 0 those pairs.  That will also 
> protect you from losing the entire disk subsystem if one of the SCSI 
> channels dies.
I like this idea, but how exactly does one bond the two channels
together? Won't this cause me to have both an /dev/sdb and an /dev/sdc? 


> 
> That 128MB of buffer cache may very well be too small to keep the IO 
> rate up, and/or there may be a more subtle problem with the LSI card, 
> and/or you may have a configuration problem, but _something(s)_ need 
> fixing since you are only getting raw sequential IO of ~100-150MB/s 
> when it should be above 500MB/s.

It looks like there's a way to add more memory to it.

> 
> This will make the most difference for initial reads (first time you 
> load a table, first time you make a given query, etc) and for any writes.
> 
> Your HW provider should be able to help you, even if some of the HW 
> in question needs to be changed.  You paid for a solution.  As long 
> as this stuff is performing at so much less then what it is supposed 
> to, you have not received the solution you paid for.
> 
> BTW, on the subject of RAID stripes IME the sweet spot tends to be in 
> the 64KB to 256KB range (very large, very read heavy data mines can 
> want larger RAID stripes.).  Only experimentation will tell you what 
> results in the best performance for your application.
I think I have them very small at the moment. 

> 
> 
> >I'm not really worried about the wr

Re: [PERFORM] extremly low memory usage

2005-08-21 Thread Ron

At 10:54 AM 8/21/2005, Jeremiah Jahn wrote:

On Sat, 2005-08-20 at 21:32 -0500, John A Meinel wrote:
> Ron wrote:
>
> Well, since you can get a read of the RAID at 150MB/s, that means that
> it is actual I/O speed. It may not be cached in RAM. Perhaps you could
> try the same test, only using say 1G, which should be cached.

[EMAIL PROTECTED] pgsql]# time dd if=/dev/zero of=testfile bs=1024 count=100
100+0 records in
100+0 records out

real0m8.885s
user0m0.299s
sys 0m6.998s

This is abysmally slow.

[EMAIL PROTECTED] pgsql]# time dd of=/dev/null if=testfile bs=1024 count=100
100+0 records in
100+0 records out

real0m1.654s
user0m0.232s
sys 0m1.415s

This transfer rate is the only one out of the 4 you have posted that 
is in the vicinity of where it should be.

The raid array I have is currently set up to use a single channel. But I
have dual controllers in the array. And dual external slots on the card.
The machine is brand new and has pci-e backplane.

So you have 2 controllers each with 2 external slots?  But you are 
currently only using 1 controller and only one external slot on that 
controller?

> > Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them
> > doing raw sequential IO like this should be capable of at
> >  ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's
BTW I'm using Seagate Cheetah 15K.4's

OK, now we have that nailed down.

> > AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic
> > MegaRAID controllers.  What I don't know is which exact one yours is,
> > nor do I know if it (or any of the MegaRAID controllers) are high
> > powered enough.

PERC4eDC-PCI Express, 128MB Cache, 2-External Channels

Looks like they are using the LSI Logic MegaRAID SCSI 320-2E 
controller.  IIUC, you have 2 of these, each with 2 external channels?

The specs on these appear a bit strange.  They are listed as being a 
PCI-Ex8 card, which means they should have a max bandwidth of 20Gb/s= 
2GB/s, yet they are also listed as only supporting dual channel U320= 
640MB/s when they could easily support quad channel U320= 
1.28GB/s.  Why bother building a PCI-Ex8 card when only a PCI-Ex4 
card (which is a more standard physical format) would've been 
enough?  Or if you are going to build a PCI-Ex8 card, why not support 
quad channel U320?  This smells like there's a problem with LSI's design.

The 128MB buffer also looks suspiciously small, and I do not see any 
upgrade path for it on LSI Logic's site.  "Serious" RAID controllers 
from companies like Xyratex, Engino, and Dot-hill can have up to 
1-2GB of buffer, and there's sound technical reasons for it.  See if 
there's a buffer upgrade available or if you can get controllers that 
have larger buffer capabilities.

Regardless of the above, each of these controllers should still be 
good for about 80-85% of 640MB/s, or ~510-540 MB/s apiece when doing 
raw sequential IO if you plug 3-4 fast enough HD's into each SCSI 
channel.  Cheetah 15K.4's certainly are fast enough.  Optimal setup 
is probably to split each RAID 1 pair so that one HD is on each of 
the SCSI channels, and then RAID 0 those pairs.  That will also 
protect you from losing the entire disk subsystem if one of the SCSI 
channels dies.

That 128MB of buffer cache may very well be too small to keep the IO 
rate up, and/or there may be a more subtle problem with the LSI card, 
and/or you may have a configuration problem, but _something(s)_ need 
fixing since you are only getting raw sequential IO of ~100-150MB/s 
when it should be above 500MB/s.

This will make the most difference for initial reads (first time you 
load a table, first time you make a given query, etc) and for any writes.

Your HW provider should be able to help you, even if some of the HW 
in question needs to be changed.  You paid for a solution.  As long 
as this stuff is performing at so much less then what it is supposed 
to, you have not received the solution you paid for.

BTW, on the subject of RAID stripes IME the sweet spot tends to be in 
the 64KB to 256KB range (very large, very read heavy data mines can 
want larger RAID stripes.).  Only experimentation will tell you what 
results in the best performance for your application.

I'm not really worried about the writing, it's the reading the reading
that needs to be faster.

Initial reads are only going to be as fast as your HD subsystem, so 
there's a reason for making the HD subsystem faster even if all you 
care about is reads.  In addition, I'll repeat my previous advice 
that upgrading to 16GB of RAM would be well worth it for you.

Hope this helps,
Ron Peacetree

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] extremly low memory usage

2005-08-21 Thread Ron

I'm resending this as it appears not to have made it to the list.

At 10:54 AM 8/21/2005, Jeremiah Jahn wrote:

On Sat, 2005-08-20 at 21:32 -0500, John A Meinel wrote:
> Ron wrote:
>
> Well, since you can get a read of the RAID at 150MB/s, that means that
> it is actual I/O speed. It may not be cached in RAM. Perhaps you could
> try the same test, only using say 1G, which should be cached.

[EMAIL PROTECTED] pgsql]# time dd if=/dev/zero of=testfile bs=1024 count=100
100+0 records in
100+0 records out

real0m8.885s
user0m0.299s
sys 0m6.998s

This is abysmally slow.

[EMAIL PROTECTED] pgsql]# time dd of=/dev/null if=testfile bs=1024 count=100
100+0 records in
100+0 records out

real0m1.654s
user0m0.232s
sys 0m1.415s

This transfer rate is the only one out of the 4 you have posted that 
is in the vicinity of where it should be.

The raid array I have is currently set up to use a single channel. But I
have dual controllers in the array. And dual external slots on the card.
The machine is brand new and has pci-e backplane.

So you have 2 controllers each with 2 external slots?  But you are 
currently only using 1 controller and only one external slot on that 
controller?

> > Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them
> > doing raw sequential IO like this should be capable of at
> >  ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's
BTW I'm using Seagate Cheetah 15K.4's

OK, now we have that nailed down.

> > AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic
> > MegaRAID controllers.  What I don't know is which exact one yours is,
> > nor do I know if it (or any of the MegaRAID controllers) are high
> > powered enough.

PERC4eDC-PCI Express, 128MB Cache, 2-External Channels

Looks like they are using the LSI Logic MegaRAID SCSI 320-2E 
controller.  IIUC, you have 2 of these, each with 2 external channels?

The specs on these appear a bit strange.  They are listed as being a 
PCI-Ex8 card, which means they should have a max bandwidth of 20Gb/s= 
2GB/s, yet they are also listed as only supporting dual channel U320= 
640MB/s when they could easily support quad channel U320= 
1.28GB/s.  Why bother building a PCI-Ex8 card when only a PCI-Ex4 
card (which is a more standard physical format) would've been 
enough?  Or if you are going to build a PCI-Ex8 card, why not support 
quad channel U320?  This smells like there's a problem with LSI's design.

The 128MB buffer also looks suspiciously small, and I do not see any 
upgrade path for it on LSI Logic's site.  "Serious" RAID controllers 
from companies like Xyratex, Engino, and Dot-hill can have up to 
1-2GB of buffer, and there's sound technical reasons for it.  See if 
there's a buffer upgrade available or if you can get controllers that 
have larger buffer capabilities.

Regardless of the above, each of these controllers should still be 
good for about 80-85% of 640MB/s, or ~510-540 MB/s apiece when doing 
raw sequential IO if you plug 3-4 fast enough HD's into each SCSI 
channel.  Cheetah 15K.4's certainly are fast enough.  Optimal setup 
is probably to split each RAID 1 pair so that one HD is on each of 
the SCSI channels, and then RAID 0 those pairs.  That will also 
protect you from losing the entire disk subsystem if one of the SCSI 
channels dies.

That 128MB of buffer cache may very well be too small to keep the IO 
rate up, and/or there may be a more subtle problem with the LSI card, 
and/or you may have a configuration problem, but _something(s)_ need 
fixing since you are only getting raw sequential IO of ~100-150MB/s 
when it should be above 500MB/s.

This will make the most difference for initial reads (first time you 
load a table, first time you make a given query, etc) and for any writes.

Your HW provider should be able to help you, even if some of the HW 
in question needs to be changed.  You paid for a solution.  As long 
as this stuff is performing at so much less then what it is supposed 
to, you have not received the solution you paid for.

BTW, on the subject of RAID stripes IME the sweet spot tends to be in 
the 64KB to 256KB range (very large, very read heavy data mines can 
want larger RAID stripes.).  Only experimentation will tell you what 
results in the best performance for your application.

I'm not really worried about the writing, it's the reading the reading
that needs to be faster.

Initial reads are only going to be as fast as your HD subsystem, so 
there's a reason for making the HD subsystem faster even if all you 
care about is reads.  In addition, I'll repeat my previous advice 
that upgrading to 16GB of RAM would be well worth it for you.

Hope this helps,
Ron Peacetree

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] extremly low memory usage

2005-08-21 Thread John A Meinel


Jeremiah Jahn wrote:

On Sat, 2005-08-20 at 21:32 -0500, John A Meinel wrote:


Ron wrote:


At 02:53 PM 8/20/2005, Jeremiah Jahn wrote:


Well, since you can get a read of the RAID at 150MB/s, that means that 
it is actual I/O speed. It may not be cached in RAM. Perhaps you could 
try the same test, only using say 1G, which should be cached.



[EMAIL PROTECTED] pgsql]# time dd if=/dev/zero of=testfile bs=1024 count=100
100+0 records in
100+0 records out

real0m8.885s
user0m0.299s
sys 0m6.998s
[EMAIL PROTECTED] pgsql]# time dd of=/dev/null if=testfile bs=1024 count=100
100+0 records in
100+0 records out

real0m1.654s
user0m0.232s
sys 0m1.415s



The write time seems about the same (but you only have 128MB of write 
cache), but your read jumped up to 620MB/s. So you drives do seem to be 
giving you 150MB/s.






...

I'm actually curious about PCI bus saturation at this point. Old 32-bit 
33MHz pci could only push 1Gbit = 100MB/s. Now, I'm guessing that this 
is a higher performance system. But I'm really surprised that your write 
speed is that close to your read speed. (100MB/s write, 150MB/s read).



The raid array I have is currently set up to use a single channel. But I
have dual controllers In the array. And dual external slots on the card.
The machine is brand new and has pci-e backplane.  





Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them 
doing raw sequential IO like this should be capable of at
~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's,  ~7*79MB/s= 553MB/s 


BTW I'm using Seagate Cheetah 15K.4's



Now, are the numbers that Ron is quoting in megabytes or megabits? I'm 
guessing he knows what he is talking about, and is doing megabytes. 
80MB/s sustained seems rather high for a hard-disk.


Though this page:
http://www.storagereview.com/articles/200411/20041116ST3146754LW_2.html

Does seem to agree with that statement. (Between 56 and 93MB/s)

And since U320 is a 320MB/s bus, it doesn't seem like anything there 
should be saturating. So why the low performance




_IF_ the controller setup is high powered enough to keep that kind of IO 
rate up.  This will require a controller or controllers providing dual 
channel U320 bandwidth externally and quad channel U320 bandwidth 
internally.  IOW, it needs a controller or controllers talking 64b 
133MHz PCI-X, reasonably fast DSP/CPU units, and probably a decent sized 
IO buffer as well.


AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic 
MegaRAID controllers.  What I don't know is which exact one yours is, 
nor do I know if it (or any of the MegaRAID controllers) are high 
powered enough.



PERC4eDC-PCI Express, 128MB Cache, 2-External Channels


Do you know which card it is? Does it look like this one:
http://www.lsilogic.com/products/megaraid/megaraid_320_2e.html

Judging by the 320 speed, and 2 external controllers, that is my guess.
They at least claim a theoretical max of 2GB/s.

Which makes you wonder why reading from RAM is only able to get 
throughput of 600MB/s. Did you run it multiple times? On my windows 
system, I get just under 550MB/s for what should be cached, copying from 
/dev/zero to /dev/null I get 2.4GB/s (though that might be a no-op).


On a similar linux machine, I'm able to get 1200MB/s for a cached file. 
(And 3GB/s for a zero=>null copy).


John
=:->




Talk to your HW supplier to make sure you have controllers adequate to 
your HD's.


...and yes, your average access time will be in the 5.5ms - 6ms range 
when doing a physical seek.
Even with RAID, you want to minimize seeks and maximize sequential IO 
when accessing them.

Best to not go to HD at all ;-)


Well, certainly, if you can get more into RAM, you're always better off. 
For writing, a battery-backed write cache, and for reading lots of 
system RAM.



I'm not really worried about the writing, it's the reading the reading
that needs to be faster. 




Hope this helps,
Ron Peacetree



John
=:->




signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] extremly low memory usage

2005-08-21 Thread Jeremiah Jahn

On Sat, 2005-08-20 at 21:32 -0500, John A Meinel wrote:
> Ron wrote:
> > At 02:53 PM 8/20/2005, Jeremiah Jahn wrote:
> > 
> >> On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote:
> >> > Jeremiah Jahn wrote:
> >> > > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote:
> >> > >
> >> 
> >> >
> >> > > it's cached alright. I'm getting a read rate of about 150MB/sec. I 
> >> would
> >> > > have thought is would be faster with my raid setup. I think I'm 
> >> going to
> >> > > scrap the whole thing and get rid of LVM. I'll just do a straight 
> >> ext3
> >> > > system. Maybe that will help. Still trying to get suggestions for a
> >> > > stripe size.
> >> > >
> 
> Well, since you can get a read of the RAID at 150MB/s, that means that 
> it is actual I/O speed. It may not be cached in RAM. Perhaps you could 
> try the same test, only using say 1G, which should be cached.

[EMAIL PROTECTED] pgsql]# time dd if=/dev/zero of=testfile bs=1024 count=100
100+0 records in
100+0 records out

real0m8.885s
user0m0.299s
sys 0m6.998s
[EMAIL PROTECTED] pgsql]# time dd of=/dev/null if=testfile bs=1024 count=100
100+0 records in
100+0 records out

real0m1.654s
user0m0.232s
sys 0m1.415s


> 
> >> >
> >> > I don't think 150MB/s is out of the realm for a 14 drive array.
> >> > How fast is time dd if=/dev/zero of=testfile bs=8192 count=100
> >> >
> >> time dd if=/dev/zero of=testfile bs=8192 count=100
> >> 100+0 records in
> >> 100+0 records out
> >>
> >> real1m24.248s
> >> user0m0.381s
> >> sys 0m33.028s
> >>
> >>
> >> > (That should create a 8GB file, which is too big to cache everything)
> >> > And then how fast is:
> >> > time dd if=testfile of=/dev/null bs=8192 count=100
> >>
> >> time dd if=testfile of=/dev/null bs=8192 count=100
> >> 100+0 records in
> >> 100+0 records out
> >>
> >> real0m54.139s
> >> user0m0.326s
> >> sys 0m8.916s
> >>
> >>
> >> and on a second run:
> >>
> >> real0m55.667s
> >> user0m0.341s
> >> sys 0m9.013s
> >>
> >>
> >> >
> >> > That should give you a semi-decent way of measuring how fast the RAID
> >> > system is, since it should be too big to cache in ram.
> >>
> >> about 150MB/Sec. Is there no better way to make this go faster...?
> 
> I'm actually curious about PCI bus saturation at this point. Old 32-bit 
> 33MHz pci could only push 1Gbit = 100MB/s. Now, I'm guessing that this 
> is a higher performance system. But I'm really surprised that your write 
> speed is that close to your read speed. (100MB/s write, 150MB/s read).

The raid array I have is currently set up to use a single channel. But I
have dual controllers In the array. And dual external slots on the card.
The machine is brand new and has pci-e backplane.  



> 
> > 
> > Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them 
> > doing raw sequential IO like this should be capable of at
> >  ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's,  ~7*79MB/s= 553MB/s 
BTW I'm using Seagate Cheetah 15K.4's

> > if using Fujitsu MAU's, and ~7*86MB/s= 602MB/s if using Maxtor Atlas 15K 
> > II's to devices external to the RAID array.
> 
> I know I thought these were SATA drives, over 2 controllers. I could be 
> completely wrong, though.
> 
> > 
> > _IF_ the controller setup is high powered enough to keep that kind of IO 
> > rate up.  This will require a controller or controllers providing dual 
> > channel U320 bandwidth externally and quad channel U320 bandwidth 
> > internally.  IOW, it needs a controller or controllers talking 64b 
> > 133MHz PCI-X, reasonably fast DSP/CPU units, and probably a decent sized 
> > IO buffer as well.
> > 
> > AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic 
> > MegaRAID controllers.  What I don't know is which exact one yours is, 
> > nor do I know if it (or any of the MegaRAID controllers) are high 
> > powered enough.

PERC4eDC-PCI Express, 128MB Cache, 2-External Channels

> > 
> > Talk to your HW supplier to make sure you have controllers adequate to 
> > your HD's.
> > 
> > ...and yes, your average access time will be in the 5.5ms - 6ms range 
> > when doing a physical seek.
> > Even with RAID, you want to minimize seeks and maximize sequential IO 
> > when accessing them.
> > Best to not go to HD at all ;-)
> 
> Well, certainly, if you can get more into RAM, you're always better off. 
> For writing, a battery-backed write cache, and for reading lots of 
> system RAM.

I'm not really worried about the writing, it's the reading the reading
that needs to be faster. 

> 
> > 
> > Hope this helps,
> > Ron Peacetree
> > 
> 
> John
> =:->
-- 
Speak softly and carry a +6 two-handed sword.


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] extremly low memory usage

2005-08-21 Thread Marko Ristola

I'm Sorry,
that I wrote that the option would risk the LOG
persistency with PostgreSQL.

I should have asked instead, that how you have taken this into account.

Tom Lane's email below convinces me, that you have taken the metadata
only journalling into account and still fulfill the persistency
of committed transactions.

This means, that Ext3 with data=writeback is safe with PostgreSQL even with
a hardware reset button. Metadata only journalling is faster, when it
can be used.

I didn't know, that any database can keep the database guarantees
with the metadata only journalling option.

I looked at your problem.
One of the problems is that you need to keep the certain data
cached in memory all the time.

That could be solved by doing
SELECT COUNT(*) from to_be_cached;
as a cron job. It loads the whole table into the Linux Kernel memory cache.

Marko Ristola

Tom Lane wrote:

>Right.  I think the optimal setting for a Postgres data directory is
>journaled metadata, non-journaled file content.  Postgres can take care
>of the data integrity for itself, but it does assume that the filesystem
>stays structurally sane (eg, data blocks don't get reassigned to the
>wrong file), so you need a filesystem guarantee about the metadata.
>
>WAL files are handled in a much more conservative way (created, filled
>with zeroes, and fsync'd before we ever put any valuable data in 'em).
>If you have WAL on its own drive then I think Mike's recommendation of
>no filesystem journalling at all for that drive is probably OK.  Or
>you can do same as above (journal metadata only) if you want a little
>extra protection.
>
>And of course all this reasoning depends on the assumption that the
>drive tells the truth about write-completion.  If the drive does write
>caching it had better be able to complete all its accepted writes before
>dying in a power failure.  (Hence, battery-backed write cache is OK, any
>other kind is evil.)
>
>   regards, tom lane
>
>---(end of broadcast)---
>TIP 2: Don't 'kill -9' the postmaster
>  
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Ron Mayer


Ron wrote:


Oops.  There's a misconception here.  ...   OTOH, access time is
_latency_, and that is not changed.  Access time for a RAID set is equal 
to that of the slowest access time, AKA highest latency, HD in the RAID 
set.


You're overgeneralizing from one specific type of raid, aren't you?

The comp.arch.storage FAQ explicitly states that "raid level 1...
shorter latency... since resolution can be from any of multiple disks".

Other raid configurations increase latency (raid 3 increases latency
from an average of 1/2 revolution to near 1 revolution); but noone
really uses those.


   Ron Mayer

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread John A Meinel

Ron wrote:

At 02:53 PM 8/20/2005, Jeremiah Jahn wrote:

On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote:
> Jeremiah Jahn wrote:
> > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote:
> >

>
> > it's cached alright. I'm getting a read rate of about 150MB/sec. I 
would
> > have thought is would be faster with my raid setup. I think I'm 
going to
> > scrap the whole thing and get rid of LVM. I'll just do a straight 
ext3

> > system. Maybe that will help. Still trying to get suggestions for a
> > stripe size.
> >

Well, since you can get a read of the RAID at 150MB/s, that means that 
it is actual I/O speed. It may not be cached in RAM. Perhaps you could 
try the same test, only using say 1G, which should be cached.

>
> I don't think 150MB/s is out of the realm for a 14 drive array.
> How fast is time dd if=/dev/zero of=testfile bs=8192 count=100
>
time dd if=/dev/zero of=testfile bs=8192 count=100
100+0 records in
100+0 records out

real1m24.248s
user0m0.381s
sys 0m33.028s

> (That should create a 8GB file, which is too big to cache everything)
> And then how fast is:
> time dd if=testfile of=/dev/null bs=8192 count=100

time dd if=testfile of=/dev/null bs=8192 count=100
100+0 records in
100+0 records out

real0m54.139s
user0m0.326s
sys 0m8.916s

and on a second run:

real0m55.667s
user0m0.341s
sys 0m9.013s

>
> That should give you a semi-decent way of measuring how fast the RAID
> system is, since it should be too big to cache in ram.

about 150MB/Sec. Is there no better way to make this go faster...?

I'm actually curious about PCI bus saturation at this point. Old 32-bit 
33MHz pci could only push 1Gbit = 100MB/s. Now, I'm guessing that this 
is a higher performance system. But I'm really surprised that your write 
speed is that close to your read speed. (100MB/s write, 150MB/s read).

Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of them 
doing raw sequential IO like this should be capable of at
 ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's,  ~7*79MB/s= 553MB/s 
if using Fujitsu MAU's, and ~7*86MB/s= 602MB/s if using Maxtor Atlas 15K 
II's to devices external to the RAID array.

I know I thought these were SATA drives, over 2 controllers. I could be 
completely wrong, though.

_IF_ the controller setup is high powered enough to keep that kind of IO 
rate up.  This will require a controller or controllers providing dual 
channel U320 bandwidth externally and quad channel U320 bandwidth 
internally.  IOW, it needs a controller or controllers talking 64b 
133MHz PCI-X, reasonably fast DSP/CPU units, and probably a decent sized 
IO buffer as well.

AFAICT, the Dell PERC4 controllers use various flavors of the LSI Logic 
MegaRAID controllers.  What I don't know is which exact one yours is, 
nor do I know if it (or any of the MegaRAID controllers) are high 
powered enough.

Talk to your HW supplier to make sure you have controllers adequate to 
your HD's.

...and yes, your average access time will be in the 5.5ms - 6ms range 
when doing a physical seek.
Even with RAID, you want to minimize seeks and maximize sequential IO 
when accessing them.

Best to not go to HD at all ;-)

Well, certainly, if you can get more into RAM, you're always better off. 
For writing, a battery-backed write cache, and for reading lots of 
system RAM.

Hope this helps,
Ron Peacetree

John
=:->

signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread John A Meinel


Jeremiah Jahn wrote:

I'm just watching gnome-system-monoitor. Which after careful
consideration.and looking at dstat means I'm on CRACKGSM isn't
showing cached memory usageI asume that the cache memory usage is
where data off of the disks would be cached...?



Well a simple "free" also tells you how much has been cached.
I believe by reading the _cach line, it looks like you have 4.6G cached. 
So you are indeed using memory.


I'm still concerned why it seems to be taking 3-4ms per index lookup, 
when things should already be cached in RAM.
Now, I may be wrong about whether the indexes are cached, but I sure 
would expect them to be.
What is the time for a cached query on your system (with normal nested 
loops)? (give the EXPLAIN ANALYZE for the *second* run, or maybe the 
fourth).


I'm glad that we aren't seeing something weird with your kernel, at least.

John
=:->





memory output from dstat is this for  a few seconds:

---procs--- --memory-usage- ---paging-- --disk/sdadisk/sdb- 
swap--- total-cpu-usage
run blk new|_used _buff _cach _free|__in_ _out_|_read write:_read write|_used 
_free|usr sys idl wai hiq siq
  0   0   0|1336M   10M 4603M   17M| 490B  833B|3823B 3503k:1607k 4285k| 160k 
2048M|  4   1  89   7   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 :   0   464k| 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   048k:   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0   132k:   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   036k:   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   012k:   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 25   0  75   0   0   0
  2   0   0|1353M   10M 4585M   18M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 25   1  75   0   0   0
  1   0   0|1321M   10M 4616M   19M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 18   8  74   0   0   0
  1   0   0|1326M   10M 4614M   17M|   0 0 |   0 0 :4096B0 | 160k 
2048M| 16  10  74   1   0   0
  1   0   0|1330M   10M 4609M   17M|   0 0 |   012k:4096B0 | 160k 
2048M| 17   9  74   0   0   0
  0   1   0|1343M   10M 4596M   17M|   0 0 |   0 0 :   0   316M| 160k 
2048M|  5  10  74  11   0   1
  0   1   0|1339M   10M 4596M   21M|   0 0 |   0 0 :   0 0 | 160k 
2048M|  0   0  74  25   0   1
  0   2   0|1334M   10M 4596M   25M|   0 0 |   0  4096B:   0 0 | 160k 
2048M|  0   0  54  44   0   1
  1   0   0|1326M   10M 4596M   34M|   0 0 |   0 0 :   0   364k| 160k 
2048M|  4   1  60  34   0   1
  1   0   0|1290M   10M 4596M   70M|   0 0 |   012k:   0 0 | 160k 
2048M| 24   1  75   0   0   0
  1   0   0|1301M   10M 4596M   59M|   0 0 |   020k:   0 0 | 160k 
2048M| 21   4  75   0   0   0
  1   0   0|1312M   10M 4596M   48M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 22   4  75   0   0   0
  1   0   0|1323M   10M 4596M   37M|   0 0 |   0 0 :   024k| 160k 
2048M| 21   4  75   0   0   0
  1   0   0|1334M   10M 4596M   25M|   0 0 |   0 0 :   056k| 160k 
2048M| 21   4  75   0   0   0



On Fri, 2005-08-19 at 16:07 -0500, John A Meinel wrote:


Jeremiah Jahn wrote:


Rebuild in progress with just ext3 on the raid array...will see if this
helps the access times. If it doesn't I'll mess with the stripe size. I
have REINDEXED, CLUSTERED, tablespaced and cached with 'cat table/index



/dev/null' none of this seems to have helped, or even increased my


memory usage. argh! The only thing about this new system that I'm
unfamiliar with is the array setup and LVM, which is why I think that's
where the issue is. clustering and indexing as well as vacuum etc are
things that I do and have been aware of for sometime. Perhaps slony is a
factor, but I really don't see it causing problems on index read speed
esp. when it's not running.

thanx for your help, I really appreciate it.
-jj-



By the way, how are you measuring memory usage? Can you give the output
of that command, just to make sure you are reading it correctly.

John
=:->





signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Ron


At 02:16 PM 8/20/2005, Jeremiah Jahn wrote:
I'm just watching gnome-system-monoitor. Which after careful 
consideration.and looking at dstat means I'm on CRACKGSM isn't
showing cached memory usageI asume that the cache memory usage 
is where data off of the disks would be cached...?


memory output from dstat is this for  a few seconds:

---procs--- --memory-usage- ---paging-- 
--disk/sdadisk/sdb- swap--- total-cpu-usage
run blk new|_used _buff _cach _free|__in_ _out_|_read write:_read 
write|_used _free|usr sys idl wai hiq siq
  0   0   0|1336M   10M 4603M   17M| 490B  833B|3823B 3503k:1607k 
4285k| 160k 2048M|  4   1  89   7   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 
:   0   464k| 160k 2048M| 25   0  75   0   0   0


  1   0   0|1334M   10M 4596M   25M|   0 0 |   0 0 
:   056k| 160k 2048M| 21   4  75   0   0   0


Then the "low memory usage" was a chimera.  Excellent!

Given the evidence in this thread, IMO you should upgrade your box to 
16GB of RAM ASAP.  That should be enough to cache most, if not all, 
of the 10GB of the "hot" part of your DB; thereby dedicating your HD 
subsystem as much as possible to writes (which is unavoidable HD 
IO).  As I've posted before, at $75-$150/GB, it's well worth the 
investment whenever you can prove it will help as we have here.


Hope this helps,
Ron Peacetree




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Ron

I'm reposting this because my mailer hiccuped when I sent it the 
first time.  If this results in a double post, I apologize.

At 02:53 PM 8/20/2005, Jeremiah Jahn wrote:

On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote:
> Jeremiah Jahn wrote:
> > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote:
> >

>
> > it's cached alright. I'm getting a read rate of about 150MB/sec. I would
> > have thought is would be faster with my raid setup. I think I'm going to
> > scrap the whole thing and get rid of LVM. I'll just do a straight ext3
> > system. Maybe that will help. Still trying to get suggestions for a
> > stripe size.
> >
>
> I don't think 150MB/s is out of the realm for a 14 drive array.
> How fast is time dd if=/dev/zero of=testfile bs=8192 count=100
>
time dd if=/dev/zero of=testfile bs=8192 count=100
100+0 records in
100+0 records out

real1m24.248s
user0m0.381s
sys 0m33.028s

> (That should create a 8GB file, which is too big to cache everything)
> And then how fast is:
> time dd if=testfile of=/dev/null bs=8192 count=100

time dd if=testfile of=/dev/null bs=8192 count=100
100+0 records in
100+0 records out

real0m54.139s
user0m0.326s
sys 0m8.916s

and on a second run:

real0m55.667s
user0m0.341s
sys 0m9.013s

>
> That should give you a semi-decent way of measuring how fast the RAID
> system is, since it should be too big to cache in ram.

about 150MB/Sec. Is there no better way to make this go faster...?
Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of 
them doing raw sequential IO like this should be capable of at
 ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's,  ~7*79MB/s= 
553MB/s if using Fujitsu MAU's, and ~7*86MB/s= 602MB/s if using 
Maxtor Atlas 15K II's to devices external to the RAID array.

_IF_ the controller setup is high powered enough to keep that kind of 
IO rate up.  This will require a controller or controllers providing 
dual channel U320 bandwidth externally and quad channel U320 
bandwidth internally.  IOW, it needs a controller or controllers 
talking 64b 133MHz PCI-X, reasonably fast DSP/CPU units, and probably 
a decent sized IO buffer as well.

AFAICT, the Dell PERC4 controllers use various flavors of the LSI 
Logic MegaRAID controllers.  What I don't know is which exact one 
yours is, nor do I know if it (or any of the MegaRAID controllers) 
are high powered enough.

Talk to your HW supplier to make sure you have controllers adequate 
to your HD's.

...and yes, your average access time will be in the 5.5ms - 6ms range 
when doing a physical seek.
Even with RAID, you want to minimize seeks and maximize sequential IO 
when accessing them.

Best to not go to HD at all ;-)

Hope this helps,
Ron Peacetree

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Ron

At 02:53 PM 8/20/2005, Jeremiah Jahn wrote:

On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote:
> Jeremiah Jahn wrote:
> > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote:
> >

>
> > it's cached alright. I'm getting a read rate of about 150MB/sec. I would
> > have thought is would be faster with my raid setup. I think I'm going to
> > scrap the whole thing and get rid of LVM. I'll just do a straight ext3
> > system. Maybe that will help. Still trying to get suggestions for a
> > stripe size.
> >
>
> I don't think 150MB/s is out of the realm for a 14 drive array.
> How fast is time dd if=/dev/zero of=testfile bs=8192 count=100
>
time dd if=/dev/zero of=testfile bs=8192 count=100
100+0 records in
100+0 records out

real1m24.248s
user0m0.381s
sys 0m33.028s

> (That should create a 8GB file, which is too big to cache everything)
> And then how fast is:
> time dd if=testfile of=/dev/null bs=8192 count=100

time dd if=testfile of=/dev/null bs=8192 count=100
100+0 records in
100+0 records out

real0m54.139s
user0m0.326s
sys 0m8.916s

and on a second run:

real0m55.667s
user0m0.341s
sys 0m9.013s

>
> That should give you a semi-decent way of measuring how fast the RAID
> system is, since it should be too big to cache in ram.

about 150MB/Sec. Is there no better way to make this go faster...?
Assuming these are U320 15Krpm 147GB HDs, a RAID 10 array of 14 of 
them doing raw sequential IO like this should be capable of at
 ~7*75MB/s= 525MB/s using Seagate Cheetah 15K.4's,  ~7*79MB/s= 
553MB/s if using Fujitsu MAU's, and ~7*86MB/s= 602MB/s if using 
Maxtor Atlas 15K II's to devices external to the RAID array.

_IF_ the controller setup is high powered enough to keep that kind of 
IO rate up.  This will require a controller or controllers providing 
dual channel U320 bandwidth externally and quad channel U320 
bandwidth internally.  IOW, it needs a controller or controllers 
talking 64b 133MHz PCI-X, reasonably fast DSP/CPU units, and probably 
a decent sized IO buffer as well.

AFAICT, the Dell PERC4 controllers use various flavors of the LSI 
Logic MegaRAID controllers.  What I don't know is which exact one 
yours is, nor do I know if it (or any of the MegaRAID controllers) 
are high powered enough.

Talk to your HW supplier to make sure you have controllers adequate 
to your HD's.

...and yes, your average access time will be in the 5.5ms - 6ms range 
when doing a physical seek.
Even with RAID, you want to minimize seeks and maximize sequential IO 
when accessing them.

Best to not go to HD at all ;-)

Hope this helps,
Ron Peacetree

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Jeremiah Jahn

On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote:
> Jeremiah Jahn wrote:
> > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote:
> >
> >>Jeremiah Jahn wrote:
> >>
> 
> 
> ...
> 
> >>
> >>Well, in general, 3ms for a single lookup seems really long. Maybe your
> >>index is bloated by not vacuuming often enough. Do you tend to get a lot
> >>of updates to litigant_details?
> >
> > I have vacuumed this already. I get lots of updates, but this data is
> > mostly unchanging.
> >
> >
> >>There are a couple possibilities at this point. First, you can REINDEX
> >>the appropriate index, and see if that helps. However, if this is a test
> >>box, it sounds like you just did a dump and reload, which wouldn't have
> >>bloat in an index.
> >
> >
> > I loaded it using slony
> 
> I don't know that slony versus pg_dump/pg_restore really matters. The
> big thing is that Updates wouldn't be trashing your index.
> But if you are saying that you cluster once/wk your index can't be that
> messed up anyway. (Unless CLUSTER messes up the non-clustered indexes,
> but that would make cluster much less useful, so I would have guessed
> this was not the case)
> 
> >
> >
> >>Another possibility. Is this the column that you usually use when
> >>pulling information out of litigant_details? If so, you can CLUSTER
> >>litigant_details on the appropriate index. This will help things be
> >>close together that should be, which decreases the index lookup costs.
> >
> > clustering on this right now. Most of the other things are already
> > clustered. name and case_data
> 
> Just as a reality check, they are clustered on the columns in question,
> right? (I don't know if this column is a primary key or not, but any
> index can be used for clustering).
> 
> >
> >
> >>However, if this is not the common column, then you probably will slow
> >>down whatever other accesses you may have on this table.
> >>
> >>After CLUSTER, the current data will stay clustered, but new data will
> >>not, so you have to continually CLUSTER, the same way that you might
> >>VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
> >>expensive as a VACUUM FULL. Be aware of this, but it might vastly
> >>improve your performance, so it would be worth it.
> >
> > I generally re-cluster once a week.
> >
> >>>
> So there is no need for preloading your indexes on the identity table.
> It is definitely not the bottleneck.
> 
> So a few design bits, which may help your database.
> Why is "actor_id" a text field instead of a number?
> >>>
> >>>This is simply due to the nature of the data.
> >>>
> >>
> >>I'm just wondering if changing into a number, and using a number->name
> >>lookup would be faster for you. It may not be. In general, I prefer to
> >>use numbers for references. I may be over paranoid, but I know that some
> >>locales are bad with string -> string comparisons. And since the data in
> >>your database is stored as UNICODE, I'm not sure if it has to do any
> >>translating or not. Again, something to consider, it may not make any
> >>difference.
> >
> > I don't believe so. I initialze the DB as 'lang=C'. I used to have the
> > problem where things were being inited as en_US. this would prevent any
> > text based index from working. This doesn't seem to be the case here, so
> > I'm not worried about it.
> >
> 
> Sorry, I think I was confusing you with someone else who posted SHOW ALL.
> 
> >
> >
> >>
> 
> ...
> 
> > it's cached alright. I'm getting a read rate of about 150MB/sec. I would
> > have thought is would be faster with my raid setup. I think I'm going to
> > scrap the whole thing and get rid of LVM. I'll just do a straight ext3
> > system. Maybe that will help. Still trying to get suggestions for a
> > stripe size.
> >
> 
> I don't think 150MB/s is out of the realm for a 14 drive array.
> How fast is
> time dd if=/dev/zero of=testfile bs=8192 count=100
time dd if=/dev/zero of=testfile bs=8192 count=100
100+0 records in
100+0 records out

real1m24.248s
user0m0.381s
sys 0m33.028s


> (That should create a 8GB file, which is too big to cache everything)
> And then how fast is:
> time dd if=testfile of=/dev/null bs=8192 count=100

time dd if=testfile of=/dev/null bs=8192 count=100
100+0 records in
100+0 records out

real0m54.139s
user0m0.326s
sys 0m8.916s


and on a second run:

real0m55.667s
user0m0.341s
sys 0m9.013s


> 
> That should give you a semi-decent way of measuring how fast the RAID
> system is, since it should be too big to cache in ram.

about 150MB/Sec. Is there no better way to make this go faster...? 

> 
> >
> >>I can point you to REINDEX and CLUSTER, but if it is caching in ram, I
> >>honestly can't say why the per loop would be that much slower.
> >>Are both systems running the same postgres version? It sounds like it is
> >>different (since you say something about switching to 8.0).
> >
> > These had little or no effect.
> > The produc

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Jeremiah Jahn

On Sat, 2005-08-20 at 11:59 -0400, Ron wrote:
> At 04:11 PM 8/19/2005, Jeremiah Jahn wrote:
> >On Fri, 2005-08-19 at 14:23 -0500, John A Meinel wrote:
> > > Ron wrote:
> > > > At 01:18 PM 8/19/2005, John A Meinel wrote:
> > > >
> > > >> Jeremiah Jahn wrote:
> > > >> > Sorry about the formatting.
> > > >> >
> > > >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
> > > >> >
> > > >> >>Jeremiah Jahn wrote:
> > > >> >>
> > > >> >>
> > > >>
> > > >> ...
> > > >>
> > > >> >>The expensive parts are the 4915 lookups into the litigant_details
> > > >> (each
> > > >> >>one takes approx 4ms for a total of ~20s).
> > > >> >>And then you do it again on case_data (average 3ms each * 4906 loops 
> > > >> >>=
> > > >> >>~15s).
> > > >> >
> > > >> > Is there some way to avoid this?
> > > >> >
> > > >>
> > > >> Well, in general, 3ms for a single lookup seems really long. Maybe your
> > > >> index is bloated by not vacuuming often enough. Do you tend to get a 
> > > >> lot
> > > >> of updates to litigant_details?
> > > >
> > > >
> > > > Given that the average access time for a 15Krpm HD is in the 5.5-6ms
> > > > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single
> > > > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case)
> > > > table accesses is requiring a seek.
> > > >
> >I think LVM may be a problem, since it also seems to break things up on
> >the file system. My access time on the seek should be around 1/7th the
> >15Krpm I believe since it's a 14 disk raid 10 array. And no other
> >traffic at the moment.
> 
> Oops.  There's a misconception here.  RAID arrays increase 
> _throughput_ AKA _bandwidth_ through parallel access to HDs.  OTOH, 
> access time is _latency_, and that is not changed.  Access time for a 
> RAID set is equal to that of the slowest access time, AKA highest 
> latency, HD in the RAID set.

so I will max out at the 5.5-6ms rang for access time?


> 
> > > Well, from what he has said, the total indexes are < 1GB and he has 6GB
> > > of ram. So everything should fit. Not to mention he is only accessing
> > > 5000/several million rows.
> >I table spaced some of the indexes and they are around 211066880 bytes
> >for the name_speed index and 149825330 for the lit_actor_speed index
> >tables seem to be about a gig.
> 
> Hmm.  And you think you are only using 250MB out of your 6GB of 
> RAM?  Something doesn't seem to add up here.  From what's been 
> posted, I'd expect much more RAM to be in use.

the cached memory usage is complete using up the rest of the memory. 

> 
> 
> > > > This implies a poor match between physical layout and access pattern.
> > >
> > > This seems to be the case. But since this is not the only query, it may
> > > be that other access patterns are more important to optimize for.
> > >
> > > >
> > > > If I understand correctly, the table should not be very fragmented given
> > > > that this is a reasonably freshly loaded DB?  That implies that the
> > > > fields being looked up are not well sorted in the table compared to the
> > > > query pattern.
> > > >
> > > > If the entire table could fit in RAM, this would be far less of a
> > > > consideration.  Failing that, the physical HD layout has to be improved
> > > > or the query pattern has to be changed to reduce seeks.
> > > >
> > > >
> > >
> > > ...
> > >
> > > >> After CLUSTER, the current data will stay clustered, but new data will
> > > >> not, so you have to continually CLUSTER, the same way that you might
> > > >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
> > > >> expensive as a VACUUM FULL. Be aware of this, but it might vastly
> > > >> improve your performance, so it would be worth it.
> > > >
> > > >
> > > > CLUSTER can be a very large maintenance overhead/problem if the table(s)
> > > > in question actually need to be "continually" re CLUSTER ed.
> > > >
> > > > If there is no better solution available, then you do what you have to,
> > > > but it feels like there should be a better answer here.
> > > >
> > > > Perhaps the DB schema needs examining to see if it matches up well with
> > > > its real usage?
> > > >
> > > > Ron Peacetree
> > > >
> > >
> > > I certainly agree that CLUSTER is expensive, and is an on-going
> > > maintenance issue. If it is the normal access pattern, though, it may be
> > > worth it.
> >
> >The query I've sent you is one of the most common I get just change the
> >name. I handle about 180K of them a day mostly between 8 and 5. The
> >clustering has never really been a problem. Like I said before I do it
> >about once a week. I handle about 3000 update an hour consisting of
> >about 1000-3000 statement per update. ie about 2.5 million updates per
> >hour. In the last few months or so I've filtered these down to about
> >400K update/delete/insert statements per hour.
> 
> 2.5M updates per hour = ~695 updates per second.  400K per hour = 
> ~112 updates per sec.  These should be well within the capabilities 
> of a RAID

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Jeremiah Jahn

I'm just watching gnome-system-monoitor. Which after careful
consideration.and looking at dstat means I'm on CRACKGSM isn't
showing cached memory usageI asume that the cache memory usage is
where data off of the disks would be cached...?



memory output from dstat is this for  a few seconds:

---procs--- --memory-usage- ---paging-- --disk/sdadisk/sdb- 
swap--- total-cpu-usage
run blk new|_used _buff _cach _free|__in_ _out_|_read write:_read write|_used 
_free|usr sys idl wai hiq siq
  0   0   0|1336M   10M 4603M   17M| 490B  833B|3823B 3503k:1607k 4285k| 160k 
2048M|  4   1  89   7   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 :   0   464k| 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   048k:   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0   132k:   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   036k:   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   012k:   0 0 | 160k 
2048M| 25   0  75   0   0   0
  1   0   0|1337M   10M 4600M   18M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 25   0  75   0   0   0
  2   0   0|1353M   10M 4585M   18M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 25   1  75   0   0   0
  1   0   0|1321M   10M 4616M   19M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 18   8  74   0   0   0
  1   0   0|1326M   10M 4614M   17M|   0 0 |   0 0 :4096B0 | 160k 
2048M| 16  10  74   1   0   0
  1   0   0|1330M   10M 4609M   17M|   0 0 |   012k:4096B0 | 160k 
2048M| 17   9  74   0   0   0
  0   1   0|1343M   10M 4596M   17M|   0 0 |   0 0 :   0   316M| 160k 
2048M|  5  10  74  11   0   1
  0   1   0|1339M   10M 4596M   21M|   0 0 |   0 0 :   0 0 | 160k 
2048M|  0   0  74  25   0   1
  0   2   0|1334M   10M 4596M   25M|   0 0 |   0  4096B:   0 0 | 160k 
2048M|  0   0  54  44   0   1
  1   0   0|1326M   10M 4596M   34M|   0 0 |   0 0 :   0   364k| 160k 
2048M|  4   1  60  34   0   1
  1   0   0|1290M   10M 4596M   70M|   0 0 |   012k:   0 0 | 160k 
2048M| 24   1  75   0   0   0
  1   0   0|1301M   10M 4596M   59M|   0 0 |   020k:   0 0 | 160k 
2048M| 21   4  75   0   0   0
  1   0   0|1312M   10M 4596M   48M|   0 0 |   0 0 :   0 0 | 160k 
2048M| 22   4  75   0   0   0
  1   0   0|1323M   10M 4596M   37M|   0 0 |   0 0 :   024k| 160k 
2048M| 21   4  75   0   0   0
  1   0   0|1334M   10M 4596M   25M|   0 0 |   0 0 :   056k| 160k 
2048M| 21   4  75   0   0   0



On Fri, 2005-08-19 at 16:07 -0500, John A Meinel wrote:
> Jeremiah Jahn wrote:
> > Rebuild in progress with just ext3 on the raid array...will see if this
> > helps the access times. If it doesn't I'll mess with the stripe size. I
> > have REINDEXED, CLUSTERED, tablespaced and cached with 'cat table/index
> >
> >>/dev/null' none of this seems to have helped, or even increased my
> >
> > memory usage. argh! The only thing about this new system that I'm
> > unfamiliar with is the array setup and LVM, which is why I think that's
> > where the issue is. clustering and indexing as well as vacuum etc are
> > things that I do and have been aware of for sometime. Perhaps slony is a
> > factor, but I really don't see it causing problems on index read speed
> > esp. when it's not running.
> >
> > thanx for your help, I really appreciate it.
> > -jj-
> >
> 
> By the way, how are you measuring memory usage? Can you give the output
> of that command, just to make sure you are reading it correctly.
> 
> John
> =:->
> 
-- 
Speak softly and carry a +6 two-handed sword.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Ron


At 04:11 PM 8/19/2005, Jeremiah Jahn wrote:

On Fri, 2005-08-19 at 14:23 -0500, John A Meinel wrote:
> Ron wrote:
> > At 01:18 PM 8/19/2005, John A Meinel wrote:
> >
> >> Jeremiah Jahn wrote:
> >> > Sorry about the formatting.
> >> >
> >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
> >> >
> >> >>Jeremiah Jahn wrote:
> >> >>
> >> >>
> >>
> >> ...
> >>
> >> >>The expensive parts are the 4915 lookups into the litigant_details
> >> (each
> >> >>one takes approx 4ms for a total of ~20s).
> >> >>And then you do it again on case_data (average 3ms each * 4906 loops =
> >> >>~15s).
> >> >
> >> > Is there some way to avoid this?
> >> >
> >>
> >> Well, in general, 3ms for a single lookup seems really long. Maybe your
> >> index is bloated by not vacuuming often enough. Do you tend to get a lot
> >> of updates to litigant_details?
> >
> >
> > Given that the average access time for a 15Krpm HD is in the 5.5-6ms
> > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single
> > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case)
> > table accesses is requiring a seek.
> >
I think LVM may be a problem, since it also seems to break things up on
the file system. My access time on the seek should be around 1/7th the
15Krpm I believe since it's a 14 disk raid 10 array. And no other
traffic at the moment.


Oops.  There's a misconception here.  RAID arrays increase 
_throughput_ AKA _bandwidth_ through parallel access to HDs.  OTOH, 
access time is _latency_, and that is not changed.  Access time for a 
RAID set is equal to that of the slowest access time, AKA highest 
latency, HD in the RAID set.



> Well, from what he has said, the total indexes are < 1GB and he has 6GB
> of ram. So everything should fit. Not to mention he is only accessing
> 5000/several million rows.
I table spaced some of the indexes and they are around 211066880 bytes
for the name_speed index and 149825330 for the lit_actor_speed index
tables seem to be about a gig.


Hmm.  And you think you are only using 250MB out of your 6GB of 
RAM?  Something doesn't seem to add up here.  From what's been 
posted, I'd expect much more RAM to be in use.




> > This implies a poor match between physical layout and access pattern.
>
> This seems to be the case. But since this is not the only query, it may
> be that other access patterns are more important to optimize for.
>
> >
> > If I understand correctly, the table should not be very fragmented given
> > that this is a reasonably freshly loaded DB?  That implies that the
> > fields being looked up are not well sorted in the table compared to the
> > query pattern.
> >
> > If the entire table could fit in RAM, this would be far less of a
> > consideration.  Failing that, the physical HD layout has to be improved
> > or the query pattern has to be changed to reduce seeks.
> >
> >
>
> ...
>
> >> After CLUSTER, the current data will stay clustered, but new data will
> >> not, so you have to continually CLUSTER, the same way that you might
> >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
> >> expensive as a VACUUM FULL. Be aware of this, but it might vastly
> >> improve your performance, so it would be worth it.
> >
> >
> > CLUSTER can be a very large maintenance overhead/problem if the table(s)
> > in question actually need to be "continually" re CLUSTER ed.
> >
> > If there is no better solution available, then you do what you have to,
> > but it feels like there should be a better answer here.
> >
> > Perhaps the DB schema needs examining to see if it matches up well with
> > its real usage?
> >
> > Ron Peacetree
> >
>
> I certainly agree that CLUSTER is expensive, and is an on-going
> maintenance issue. If it is the normal access pattern, though, it may be
> worth it.

The query I've sent you is one of the most common I get just change the
name. I handle about 180K of them a day mostly between 8 and 5. The
clustering has never really been a problem. Like I said before I do it
about once a week. I handle about 3000 update an hour consisting of
about 1000-3000 statement per update. ie about 2.5 million updates per
hour. In the last few months or so I've filtered these down to about
400K update/delete/insert statements per hour.


2.5M updates per hour = ~695 updates per second.  400K per hour = 
~112 updates per sec.  These should be well within the capabilities 
of a RAID 10 subsystem based on 14 15Krpm HDs assuming a decent RAID 
card.  What is the exact HW of the RAID subsystem involved and how is 
it configured?  You shouldn't be having a performance problem AFAICT...


> I also wonder, though, if his table is properly normalized. 
Which, as you mentioned, might lead to improved access patterns.
The system is about as normalized as I can get it. In general the 
layout is the following:
courts have cases, cases have litigant_details. Actors have 
identities and litigant_details.


Hmmm.  Can you tell us more about the actual schema, I may h

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Tom Lane

Michael Stone <[EMAIL PROTECTED]> writes:
> On Sat, Aug 20, 2005 at 02:17:54PM +0300, Marko Ristola wrote:
>> Based on my knoledge, Ext3 is good with keeping filesystem integrity
>> AND data integrity while pressing the reset button.  However, by
>> selecting data=writeback, you gain more speed, but you risk the data
>> integrity during a crash: Ext3 garantees only filesystem integrity.

> That's why postgres keeps its own transaction log. Any of these
> filesystems guarantee data integrity for data that's been synced to
> disk, and postgres keeps track of what data has and has not been
> committed so it can recover gracefully from a failure.

Right.  I think the optimal setting for a Postgres data directory is
journaled metadata, non-journaled file content.  Postgres can take care
of the data integrity for itself, but it does assume that the filesystem
stays structurally sane (eg, data blocks don't get reassigned to the
wrong file), so you need a filesystem guarantee about the metadata.

WAL files are handled in a much more conservative way (created, filled
with zeroes, and fsync'd before we ever put any valuable data in 'em).
If you have WAL on its own drive then I think Mike's recommendation of
no filesystem journalling at all for that drive is probably OK.  Or
you can do same as above (journal metadata only) if you want a little
extra protection.

And of course all this reasoning depends on the assumption that the
drive tells the truth about write-completion.  If the drive does write
caching it had better be able to complete all its accepted writes before
dying in a power failure.  (Hence, battery-backed write cache is OK, any
other kind is evil.)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Michael Stone


On Sat, Aug 20, 2005 at 02:17:54PM +0300, Marko Ristola wrote:

Based on my knoledge, Ext3 is good with keeping filesystem integrity
AND data integrity while pressing the reset button.  However, by
selecting data=writeback, you gain more speed, but you risk the data
integrity during a crash: Ext3 garantees only filesystem integrity.


That's why postgres keeps its own transaction log. Any of these
filesystems guarantee data integrity for data that's been synced to
disk, and postgres keeps track of what data has and has not been
committed so it can recover gracefully from a failure. That's why most
filesystems are designed the way they are; the application can determine
what things need better data integrity and which need better performance
on a case-by-case basis. 


Mike Stone

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Michael Stone


On Sat, Aug 20, 2005 at 01:12:15AM -0600, Dan Harris wrote:
XFS seems to be a trusted choice, followed by Reiser and JFS both  
with the occasional controversy when the comparisons pop up.


And don't put the xlog on a journaled filesystem. There is no advantage
to doing so, and it will slow things down. (Assuming a sane seperate xlog
partition configuration, sized reasonably.)

Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Marko Ristola

Dan Harris wrote:

> From my recent experiences, I can say ext3 is probably not a great 
> choice for Pg databases.  If you check the archives you'll see 
> there's a lot of discussion about various journalling filesystems and 
> ext3 usually(always?) comes up on the bottom as far as performance 
> goes.  If you insist on using it, I would at least recommend the 
> noatime option in fstab and using data=writeback to get the faster 

Based on my knoledge, Ext3 is good with keeping filesystem integrity AND
data integrity while
pressing the reset button.
However, by selecting data=writeback, you gain more speed, but you risk the
data integrity during a crash: Ext3 garantees only filesystem integrity.

This means with database transaction logs: The last transactions are not
guaranteed to be written into the hard drives during a hardware reset,
meaning of a loss of some committed transactions.

Reiserfs is known to do things this false way also.
Is there a way with a Reiserfs filesystem to fulfill both
filesystem AND data integrity requirements nowadays?

See for example "man mount" to see the effects of data=journal,
data=ordered(default) and
data=writeback for Ext3. Only the writeback risks data integrity.

Ext3 is the only journaled filesystem, that I know that fulfills
these fundamental data integrity guarantees. Personally I like about
such filesystems, even though it means less speed.

Marko Ristola

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] extremly low memory usage

2005-08-20 Thread Dan Harris



On Aug 19, 2005, at 3:01 PM, Jeremiah Jahn wrote:

Rebuild in progress with just ext3 on the raid array...will see if  
this

helps the access times.


From my recent experiences, I can say ext3 is probably not a great  
choice for Pg databases.  If you check the archives you'll see  
there's a lot of discussion about various journalling filesystems and  
ext3 usually(always?) comes up on the bottom as far as performance  
goes.  If you insist on using it, I would at least recommend the  
noatime option in fstab and using data=writeback to get the faster of  
the journal modes.


XFS seems to be a trusted choice, followed by Reiser and JFS both  
with the occasional controversy when the comparisons pop up.


-Dan


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] extremly low memory usage

2005-08-19 Thread John A Meinel

Jeremiah Jahn wrote:
> Rebuild in progress with just ext3 on the raid array...will see if this
> helps the access times. If it doesn't I'll mess with the stripe size. I
> have REINDEXED, CLUSTERED, tablespaced and cached with 'cat table/index
>
>>/dev/null' none of this seems to have helped, or even increased my
>
> memory usage. argh! The only thing about this new system that I'm
> unfamiliar with is the array setup and LVM, which is why I think that's
> where the issue is. clustering and indexing as well as vacuum etc are
> things that I do and have been aware of for sometime. Perhaps slony is a
> factor, but I really don't see it causing problems on index read speed
> esp. when it's not running.
>
> thanx for your help, I really appreciate it.
> -jj-
>

By the way, how are you measuring memory usage? Can you give the output
of that command, just to make sure you are reading it correctly.

John
=:->



signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] extremly low memory usage

2005-08-19 Thread John A Meinel

Jeremiah Jahn wrote:
> On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote:
>
>>Jeremiah Jahn wrote:
>>

...

>>
>>Well, in general, 3ms for a single lookup seems really long. Maybe your
>>index is bloated by not vacuuming often enough. Do you tend to get a lot
>>of updates to litigant_details?
>
> I have vacuumed this already. I get lots of updates, but this data is
> mostly unchanging.
>
>
>>There are a couple possibilities at this point. First, you can REINDEX
>>the appropriate index, and see if that helps. However, if this is a test
>>box, it sounds like you just did a dump and reload, which wouldn't have
>>bloat in an index.
>
>
> I loaded it using slony

I don't know that slony versus pg_dump/pg_restore really matters. The
big thing is that Updates wouldn't be trashing your index.
But if you are saying that you cluster once/wk your index can't be that
messed up anyway. (Unless CLUSTER messes up the non-clustered indexes,
but that would make cluster much less useful, so I would have guessed
this was not the case)

>
>
>>Another possibility. Is this the column that you usually use when
>>pulling information out of litigant_details? If so, you can CLUSTER
>>litigant_details on the appropriate index. This will help things be
>>close together that should be, which decreases the index lookup costs.
>
> clustering on this right now. Most of the other things are already
> clustered. name and case_data

Just as a reality check, they are clustered on the columns in question,
right? (I don't know if this column is a primary key or not, but any
index can be used for clustering).

>
>
>>However, if this is not the common column, then you probably will slow
>>down whatever other accesses you may have on this table.
>>
>>After CLUSTER, the current data will stay clustered, but new data will
>>not, so you have to continually CLUSTER, the same way that you might
>>VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
>>expensive as a VACUUM FULL. Be aware of this, but it might vastly
>>improve your performance, so it would be worth it.
>
> I generally re-cluster once a week.
>
>>>
So there is no need for preloading your indexes on the identity table.
It is definitely not the bottleneck.

So a few design bits, which may help your database.
Why is "actor_id" a text field instead of a number?
>>>
>>>This is simply due to the nature of the data.
>>>
>>
>>I'm just wondering if changing into a number, and using a number->name
>>lookup would be faster for you. It may not be. In general, I prefer to
>>use numbers for references. I may be over paranoid, but I know that some
>>locales are bad with string -> string comparisons. And since the data in
>>your database is stored as UNICODE, I'm not sure if it has to do any
>>translating or not. Again, something to consider, it may not make any
>>difference.
>
> I don't believe so. I initialze the DB as 'lang=C'. I used to have the
> problem where things were being inited as en_US. this would prevent any
> text based index from working. This doesn't seem to be the case here, so
> I'm not worried about it.
>

Sorry, I think I was confusing you with someone else who posted SHOW ALL.

>
>
>>

...

> it's cached alright. I'm getting a read rate of about 150MB/sec. I would
> have thought is would be faster with my raid setup. I think I'm going to
> scrap the whole thing and get rid of LVM. I'll just do a straight ext3
> system. Maybe that will help. Still trying to get suggestions for a
> stripe size.
>

I don't think 150MB/s is out of the realm for a 14 drive array.
How fast is
time dd if=/dev/zero of=testfile bs=8192 count=100
(That should create a 8GB file, which is too big to cache everything)
And then how fast is:
time dd if=testfile of=/dev/null bs=8192 count=100

That should give you a semi-decent way of measuring how fast the RAID
system is, since it should be too big to cache in ram.

>
>>I can point you to REINDEX and CLUSTER, but if it is caching in ram, I
>>honestly can't say why the per loop would be that much slower.
>>Are both systems running the same postgres version? It sounds like it is
>>different (since you say something about switching to 8.0).
>
> These had little or no effect.
> The production machine is running 7.4 while the devel machine is running
> 8.0
>

Well, my concern is that maybe some portion of the 8.0 code actually
slowed things down for you. You could try reverting to 7.4 on the devel
box, though I think playing with upgrading to 8.1 might be more worthwhile.

...

>
> this is a cached version.
>

I assume that you mean this is the second run of the query. I can't
compare it too much, since this is "smith" rather than "jones". But this
one is 17s rather than the other one being 46s.

And that includes having 8k rows instead of having 5k rows.

Have you tried other values with disabled nested loops? Because this
query (at least in cached form) seems to be *way* faster than with
nested loops.
I know that you

Re: [PERFORM] extremly low memory usage

2005-08-19 Thread Jeremiah Jahn

Rebuild in progress with just ext3 on the raid array...will see if this
helps the access times. If it doesn't I'll mess with the stripe size. I
have REINDEXED, CLUSTERED, tablespaced and cached with 'cat table/index
> /dev/null' none of this seems to have helped, or even increased my
memory usage. argh! The only thing about this new system that I'm
unfamiliar with is the array setup and LVM, which is why I think that's
where the issue is. clustering and indexing as well as vacuum etc are
things that I do and have been aware of for sometime. Perhaps slony is a
factor, but I really don't see it causing problems on index read speed
esp. when it's not running. 

thanx for your help, I really appreciate it. 
-jj-




On Fri, 2005-08-19 at 14:23 -0500, John A Meinel wrote:
> Ron wrote:
> > At 01:18 PM 8/19/2005, John A Meinel wrote:
> >
> >> Jeremiah Jahn wrote:
> >> > Sorry about the formatting.
> >> >
> >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
> >> >
> >> >>Jeremiah Jahn wrote:
> >> >>
> >> >>
> >>
> >> ...
> >>
> >> >>The expensive parts are the 4915 lookups into the litigant_details
> >> (each
> >> >>one takes approx 4ms for a total of ~20s).
> >> >>And then you do it again on case_data (average 3ms each * 4906 loops =
> >> >>~15s).
> >> >
> >> > Is there some way to avoid this?
> >> >
> >>
> >> Well, in general, 3ms for a single lookup seems really long. Maybe your
> >> index is bloated by not vacuuming often enough. Do you tend to get a lot
> >> of updates to litigant_details?
> >
> >
> > Given that the average access time for a 15Krpm HD is in the 5.5-6ms
> > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single
> > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case)
> > table accesses is requiring a seek.
> >
> 
> 
> Well, from what he has said, the total indexes are < 1GB and he has 6GB
> of ram. So everything should fit. Not to mention he is only accessing
> 5000/several million rows.
> 
> 
> > This implies a poor match between physical layout and access pattern.
> 
> This seems to be the case. But since this is not the only query, it may
> be that other access patterns are more important to optimize for.
> 
> >
> > If I understand correctly, the table should not be very fragmented given
> > that this is a reasonably freshly loaded DB?  That implies that the
> > fields being looked up are not well sorted in the table compared to the
> > query pattern.
> >
> > If the entire table could fit in RAM, this would be far less of a
> > consideration.  Failing that, the physical HD layout has to be improved
> > or the query pattern has to be changed to reduce seeks.
> >
> >
> 
> ...
> 
> >> After CLUSTER, the current data will stay clustered, but new data will
> >> not, so you have to continually CLUSTER, the same way that you might
> >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
> >> expensive as a VACUUM FULL. Be aware of this, but it might vastly
> >> improve your performance, so it would be worth it.
> >
> >
> > CLUSTER can be a very large maintenance overhead/problem if the table(s)
> > in question actually need to be "continually" re CLUSTER ed.
> >
> > If there is no better solution available, then you do what you have to,
> > but it feels like there should be a better answer here.
> >
> > Perhaps the DB schema needs examining to see if it matches up well with
> > its real usage?
> >
> > Ron Peacetree
> >
> 
> I certainly agree that CLUSTER is expensive, and is an on-going
> maintenance issue. If it is the normal access pattern, though, it may be
> worth it.
> 
> I also wonder, though, if his table is properly normalized. Which, as
> you mentioned, might lead to improved access patterns.
> 
> John
> =:->
-- 
Speak softly and carry a +6 two-handed sword.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] extremly low memory usage

2005-08-19 Thread Jeremiah Jahn

On Fri, 2005-08-19 at 14:23 -0500, John A Meinel wrote:
> Ron wrote:
> > At 01:18 PM 8/19/2005, John A Meinel wrote:
> >
> >> Jeremiah Jahn wrote:
> >> > Sorry about the formatting.
> >> >
> >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
> >> >
> >> >>Jeremiah Jahn wrote:
> >> >>
> >> >>
> >>
> >> ...
> >>
> >> >>The expensive parts are the 4915 lookups into the litigant_details
> >> (each
> >> >>one takes approx 4ms for a total of ~20s).
> >> >>And then you do it again on case_data (average 3ms each * 4906 loops =
> >> >>~15s).
> >> >
> >> > Is there some way to avoid this?
> >> >
> >>
> >> Well, in general, 3ms for a single lookup seems really long. Maybe your
> >> index is bloated by not vacuuming often enough. Do you tend to get a lot
> >> of updates to litigant_details?
> >
> >
> > Given that the average access time for a 15Krpm HD is in the 5.5-6ms
> > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single
> > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case)
> > table accesses is requiring a seek.
> >
I think LVM may be a problem, since it also seems to break things up on
the file system. My access time on the seek should be around 1/7th the
15Krpm I believe since it's a 14 disk raid 10 array. And no other
traffic at the moment. 



> 
> 
> Well, from what he has said, the total indexes are < 1GB and he has 6GB
> of ram. So everything should fit. Not to mention he is only accessing
> 5000/several million rows.
I table spaced some of the indexes and they are around 211066880 bytes
for the name_speed index and 149825330 for the lit_actor_speed index
tables seem to be about a gig.  

> 
> 
> > This implies a poor match between physical layout and access pattern.
> 
> This seems to be the case. But since this is not the only query, it may
> be that other access patterns are more important to optimize for.
> 
> >
> > If I understand correctly, the table should not be very fragmented given
> > that this is a reasonably freshly loaded DB?  That implies that the
> > fields being looked up are not well sorted in the table compared to the
> > query pattern.
> >
> > If the entire table could fit in RAM, this would be far less of a
> > consideration.  Failing that, the physical HD layout has to be improved
> > or the query pattern has to be changed to reduce seeks.
> >
> >
> 
> ...
> 
> >> After CLUSTER, the current data will stay clustered, but new data will
> >> not, so you have to continually CLUSTER, the same way that you might
> >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
> >> expensive as a VACUUM FULL. Be aware of this, but it might vastly
> >> improve your performance, so it would be worth it.
> >
> >
> > CLUSTER can be a very large maintenance overhead/problem if the table(s)
> > in question actually need to be "continually" re CLUSTER ed.
> >
> > If there is no better solution available, then you do what you have to,
> > but it feels like there should be a better answer here.
> >
> > Perhaps the DB schema needs examining to see if it matches up well with
> > its real usage?
> >
> > Ron Peacetree
> >
> 
> I certainly agree that CLUSTER is expensive, and is an on-going
> maintenance issue. If it is the normal access pattern, though, it may be
> worth it.

The query I've sent you is one of the most common I get just change the
name. I handle about 180K of them a day mostly between 8 and 5. The
clustering has never really been a problem. Like I said before I do it
about once a week. I handle about 3000 update an hour consisting of
about 1000-3000 statement per update. ie about 2.5 million updates per
hour. In the last few months or so I've filtered these down to about
400K update/delete/insert statements per hour. 

> 
> I also wonder, though, if his table is properly normalized. Which, as
> you mentioned, might lead to improved access patterns.
The system is about as normalized as I can get it. In general the layout
is the following:
courts have cases, cases have litigant_details. Actors have identities
and litigant_details. 

> 
> John
> =:->
-- 
Speak softly and carry a +6 two-handed sword.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] extremly low memory usage

2005-08-19 Thread Jeremiah Jahn

On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote:
> Jeremiah Jahn wrote:
> > Sorry about the formatting.
> >
> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
> >
> >>Jeremiah Jahn wrote:
> >>
> >>
> 
> ...
> 
> >>The expensive parts are the 4915 lookups into the litigant_details (each
> >>one takes approx 4ms for a total of ~20s).
> >>And then you do it again on case_data (average 3ms each * 4906 loops =
> >>~15s).
> >
> > Is there some way to avoid this?
> >
> 
> Well, in general, 3ms for a single lookup seems really long. Maybe your
> index is bloated by not vacuuming often enough. Do you tend to get a lot
> of updates to litigant_details?
I have vacuumed this already. I get lots of updates, but this data is
mostly unchanging.

> 
> There are a couple possibilities at this point. First, you can REINDEX
> the appropriate index, and see if that helps. However, if this is a test
> box, it sounds like you just did a dump and reload, which wouldn't have
> bloat in an index.

I loaded it using slony

> 
> Another possibility. Is this the column that you usually use when
> pulling information out of litigant_details? If so, you can CLUSTER
> litigant_details on the appropriate index. This will help things be
> close together that should be, which decreases the index lookup costs.
clustering on this right now. Most of the other things are already
clustered. name and case_data

> 
> However, if this is not the common column, then you probably will slow
> down whatever other accesses you may have on this table.
> 
> After CLUSTER, the current data will stay clustered, but new data will
> not, so you have to continually CLUSTER, the same way that you might
> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
> expensive as a VACUUM FULL. Be aware of this, but it might vastly
> improve your performance, so it would be worth it.
I generally re-cluster once a week.
> 
> >
> >
> >>So there is no need for preloading your indexes on the identity table.
> >>It is definitely not the bottleneck.
> >>
> >>So a few design bits, which may help your database.
> >>Why is "actor_id" a text field instead of a number?
> >
> > This is simply due to the nature of the data.
> >
> 
> I'm just wondering if changing into a number, and using a number->name
> lookup would be faster for you. It may not be. In general, I prefer to
> use numbers for references. I may be over paranoid, but I know that some
> locales are bad with string -> string comparisons. And since the data in
> your database is stored as UNICODE, I'm not sure if it has to do any
> translating or not. Again, something to consider, it may not make any
> difference.
I don't believe so. I initialze the DB as 'lang=C'. I used to have the
problem where things were being inited as en_US. this would prevent any
text based index from working. This doesn't seem to be the case here, so
I'm not worried about it.


> 
> 
> >
> >>You could try creating an index on "litigant_details (actor_id,
> >>count_ori)" so that it can do just an index lookup, rather than an index
> >>+ filter.
> >
> > I have one, but it doesn't seem to like to use it. Don't really need it
> > though, I can just drop the court_id out of the query. It's redundant,
> > since each actor_id is also unique in litigant details. I had run vac
> > full and analyze but I ran them again anyway and the planning improved.
> > However, my 14 disk raid 10 array is still slower than my 3 disk raid 5
> > on my production box. 46sec vs 30sec (with live traffic on the
> > production) One of the strange things is that when I run the cat command
> > on my index and tables that are "HOT" it has no effect on memory usage.
> > Right now I'm running ext3 on LVM. I'm still in a position to redo the
> > file system and everything. Is this a good way to do it or should I
> > switch to something else? What about stripe and extent sizes...? kernel
> > parameters to change?
> 
> Well, the plans are virtually identical. There is one small difference
> as to whether it joins against case_data or court first. But 'court' is
> very tiny (small enough to use a seqscan instead of index scan) I'm a
> little surprised with court being this small that it doesn't do
> something like a hash aggregation, but court takes no time anyway.
> 
> The real problem is that your nested loop index time is *much* slower.
> 
> Devel:
> ->  Index Scan using lit_actor_speed on litigant_details
>   (cost=0.00..3.96 rows=1 width=81)
>   (actual time=4.788..4.812 rows=1 loops=5057)
> 
> Production:
> ->  Index Scan using lit_actor_speed on litigant_details
>   (cost=0.00..5.63 rows=1 width=81)
>   (actual time=3.355..3.364 rows=1 loops=5057)
> 
> Devel:
> ->  Index Scan using case_speed on case_data
>   (cost=0.00..3.46 rows=1 width=26)
>   (actual time=4.222..4.230 rows=1 loops=5052)
> 
> Production:
> ->  Index Scan using case_data_pkey on case_data
>   (cost=0.00..5.31 rows=1 width=26)
>   (actual ti

Re: [PERFORM] extremly low memory usage

2005-08-19 Thread John A Meinel

Ron wrote:
> At 01:18 PM 8/19/2005, John A Meinel wrote:
>
>> Jeremiah Jahn wrote:
>> > Sorry about the formatting.
>> >
>> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
>> >
>> >>Jeremiah Jahn wrote:
>> >>
>> >>
>>
>> ...
>>
>> >>The expensive parts are the 4915 lookups into the litigant_details
>> (each
>> >>one takes approx 4ms for a total of ~20s).
>> >>And then you do it again on case_data (average 3ms each * 4906 loops =
>> >>~15s).
>> >
>> > Is there some way to avoid this?
>> >
>>
>> Well, in general, 3ms for a single lookup seems really long. Maybe your
>> index is bloated by not vacuuming often enough. Do you tend to get a lot
>> of updates to litigant_details?
>
>
> Given that the average access time for a 15Krpm HD is in the 5.5-6ms
> range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single
> lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case)
> table accesses is requiring a seek.
>


Well, from what he has said, the total indexes are < 1GB and he has 6GB
of ram. So everything should fit. Not to mention he is only accessing
5000/several million rows.


> This implies a poor match between physical layout and access pattern.

This seems to be the case. But since this is not the only query, it may
be that other access patterns are more important to optimize for.

>
> If I understand correctly, the table should not be very fragmented given
> that this is a reasonably freshly loaded DB?  That implies that the
> fields being looked up are not well sorted in the table compared to the
> query pattern.
>
> If the entire table could fit in RAM, this would be far less of a
> consideration.  Failing that, the physical HD layout has to be improved
> or the query pattern has to be changed to reduce seeks.
>
>

...

>> After CLUSTER, the current data will stay clustered, but new data will
>> not, so you have to continually CLUSTER, the same way that you might
>> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
>> expensive as a VACUUM FULL. Be aware of this, but it might vastly
>> improve your performance, so it would be worth it.
>
>
> CLUSTER can be a very large maintenance overhead/problem if the table(s)
> in question actually need to be "continually" re CLUSTER ed.
>
> If there is no better solution available, then you do what you have to,
> but it feels like there should be a better answer here.
>
> Perhaps the DB schema needs examining to see if it matches up well with
> its real usage?
>
> Ron Peacetree
>

I certainly agree that CLUSTER is expensive, and is an on-going
maintenance issue. If it is the normal access pattern, though, it may be
worth it.

I also wonder, though, if his table is properly normalized. Which, as
you mentioned, might lead to improved access patterns.

John
=:->


signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] extremly low memory usage

2005-08-19 Thread Ron

At 01:18 PM 8/19/2005, John A Meinel wrote:

Jeremiah Jahn wrote:
> Sorry about the formatting.
>
> On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
>
>>Jeremiah Jahn wrote:
>>
>>

...

>>The expensive parts are the 4915 lookups into the litigant_details (each
>>one takes approx 4ms for a total of ~20s).
>>And then you do it again on case_data (average 3ms each * 4906 loops =
>>~15s).
>
> Is there some way to avoid this?
>

Well, in general, 3ms for a single lookup seems really long. Maybe your
index is bloated by not vacuuming often enough. Do you tend to get a lot
of updates to litigant_details?

Given that the average access time for a 15Krpm HD is in the 5.5-6ms 
range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a 
single lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm 
case) table accesses is requiring a seek.

This implies a poor match between physical layout and access pattern.

If I understand correctly, the table should not be very fragmented 
given that this is a reasonably freshly loaded DB?  That implies that 
the fields being looked up are not well sorted in the table compared 
to the query pattern.

If the entire table could fit in RAM, this would be far less of a 
consideration.  Failing that, the physical HD layout has to be 
improved or the query pattern has to be changed to reduce seeks.

There are a couple possibilities at this point. First, you can REINDEX
the appropriate index, and see if that helps. However, if this is a test
box, it sounds like you just did a dump and reload, which wouldn't have
bloat in an index.

Another possibility. Is this the column that you usually use when
pulling information out of litigant_details? If so, you can CLUSTER
litigant_details on the appropriate index. This will help things be
close together that should be, which decreases the index lookup costs.

However, if this is not the common column, then you probably will slow
down whatever other accesses you may have on this table.

After CLUSTER, the current data will stay clustered, but new data will
not, so you have to continually CLUSTER, the same way that you might
VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
expensive as a VACUUM FULL. Be aware of this, but it might vastly
improve your performance, so it would be worth it.

CLUSTER can be a very large maintenance overhead/problem if the 
table(s) in question actually need to be "continually" re CLUSTER ed.

If there is no better solution available, then you do what you have 
to, but it feels like there should be a better answer here.

Perhaps the DB schema needs examining to see if it matches up well 
with its real usage?

Ron Peacetree

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] extremly low memory usage

2005-08-19 Thread John A Meinel

Jeremiah Jahn wrote:
> Sorry about the formatting.
>
> On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
>
>>Jeremiah Jahn wrote:
>>
>>

...

>>The expensive parts are the 4915 lookups into the litigant_details (each
>>one takes approx 4ms for a total of ~20s).
>>And then you do it again on case_data (average 3ms each * 4906 loops =
>>~15s).
>
> Is there some way to avoid this?
>

Well, in general, 3ms for a single lookup seems really long. Maybe your
index is bloated by not vacuuming often enough. Do you tend to get a lot
of updates to litigant_details?

There are a couple possibilities at this point. First, you can REINDEX
the appropriate index, and see if that helps. However, if this is a test
box, it sounds like you just did a dump and reload, which wouldn't have
bloat in an index.

Another possibility. Is this the column that you usually use when
pulling information out of litigant_details? If so, you can CLUSTER
litigant_details on the appropriate index. This will help things be
close together that should be, which decreases the index lookup costs.

However, if this is not the common column, then you probably will slow
down whatever other accesses you may have on this table.

After CLUSTER, the current data will stay clustered, but new data will
not, so you have to continually CLUSTER, the same way that you might
VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
expensive as a VACUUM FULL. Be aware of this, but it might vastly
improve your performance, so it would be worth it.

>
>
>>So there is no need for preloading your indexes on the identity table.
>>It is definitely not the bottleneck.
>>
>>So a few design bits, which may help your database.
>>Why is "actor_id" a text field instead of a number?
>
> This is simply due to the nature of the data.
>

I'm just wondering if changing into a number, and using a number->name
lookup would be faster for you. It may not be. In general, I prefer to
use numbers for references. I may be over paranoid, but I know that some
locales are bad with string -> string comparisons. And since the data in
your database is stored as UNICODE, I'm not sure if it has to do any
translating or not. Again, something to consider, it may not make any
difference.

>
>>You could try creating an index on "litigant_details (actor_id,
>>count_ori)" so that it can do just an index lookup, rather than an index
>>+ filter.
>
> I have one, but it doesn't seem to like to use it. Don't really need it
> though, I can just drop the court_id out of the query. It's redundant,
> since each actor_id is also unique in litigant details. I had run vac
> full and analyze but I ran them again anyway and the planning improved.
> However, my 14 disk raid 10 array is still slower than my 3 disk raid 5
> on my production box. 46sec vs 30sec (with live traffic on the
> production) One of the strange things is that when I run the cat command
> on my index and tables that are "HOT" it has no effect on memory usage.
> Right now I'm running ext3 on LVM. I'm still in a position to redo the
> file system and everything. Is this a good way to do it or should I
> switch to something else? What about stripe and extent sizes...? kernel
> parameters to change?

Well, the plans are virtually identical. There is one small difference
as to whether it joins against case_data or court first. But 'court' is
very tiny (small enough to use a seqscan instead of index scan) I'm a
little surprised with court being this small that it doesn't do
something like a hash aggregation, but court takes no time anyway.

The real problem is that your nested loop index time is *much* slower.

Devel:
->  Index Scan using lit_actor_speed on litigant_details
(cost=0.00..3.96 rows=1 width=81)
(actual time=4.788..4.812 rows=1 loops=5057)

Production:
->  Index Scan using lit_actor_speed on litigant_details
(cost=0.00..5.63 rows=1 width=81)
(actual time=3.355..3.364 rows=1 loops=5057)

Devel:
->  Index Scan using case_speed on case_data
(cost=0.00..3.46 rows=1 width=26)
(actual time=4.222..4.230 rows=1 loops=5052)

Production:
->  Index Scan using case_data_pkey on case_data
(cost=0.00..5.31 rows=1 width=26)
(actual time=1.897..1.904 rows=1 loops=5052)

Notice that the actual per-row cost is as much as 1/2 less than on your
devel box.

As a test, can you do "time cat $index_file >/dev/null" a couple of
times. And then determine the MB/s.
Alternatively run vmstat in another shell. If the read/s doesn't change,
then you know the "cat" is being served from RAM, and thus it really is
cached.

I can point you to REINDEX and CLUSTER, but if it is caching in ram, I
honestly can't say why the per loop would be that much slower.
Are both systems running the same postgres version? It sounds like it is
different (since you say something about switching to 8.0).
I doubt it, but you might try an 8.1devel version.

...

>>Do you regularly vacuum analyze your tables?

Re: [PERFORM] extremly low memory usage

2005-08-19 Thread Jeremiah Jahn

Sorry about the formatting. 

On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
> Jeremiah Jahn wrote:
> 
> >here's an example standard query. Ireally have to make the first hit go
> >faster. The table is clustered as well on full_name as well. 'Smith%'
> >took 87 seconds on the first hit. I wonder if I set up may array wrong.
> >I remeber see something about DMA access versus something else, and
> >choose DMA access. LVM maybe?
> >  
> >
> It would be nice if you would format your queries to be a little bit
> easier to read before posting them.
> However, I believe I am reading it correctly, to say that the index scan
> on identity is not your slow point. In fact, as near as I can tell, it
> only takes 52ms to complete.
> 
> The expensive parts are the 4915 lookups into the litigant_details (each
> one takes approx 4ms for a total of ~20s).
> And then you do it again on case_data (average 3ms each * 4906 loops =
> ~15s).
Is there some way to avoid this? 


> 
> So there is no need for preloading your indexes on the identity table.
> It is definitely not the bottleneck.
> 
> So a few design bits, which may help your database.
> Why is "actor_id" a text field instead of a number?
This is simply due to the nature of the data.

> You could try creating an index on "litigant_details (actor_id,
> count_ori)" so that it can do just an index lookup, rather than an index
> + filter.
I have one, but it doesn't seem to like to use it. Don't really need it
though, I can just drop the court_id out of the query. It's redundant,
since each actor_id is also unique in litigant details. I had run vac
full and analyze but I ran them again anyway and the planning improved.
However, my 14 disk raid 10 array is still slower than my 3 disk raid 5
on my production box. 46sec vs 30sec (with live traffic on the
production) One of the strange things is that when I run the cat command
on my index and tables that are "HOT" it has no effect on memory usage.
Right now I'm running ext3 on LVM. I'm still in a position to redo the
file system and everything. Is this a good way to do it or should I
switch to something else? What about stripe and extent sizes...? kernel
parameters to change? 



---devel box:---

copa=# EXPLAIN ANALYZE select 
full_name,identity_id,identity.case_id,court.id,date_of_birth,assigned_case_role,litigant_details.impound_litigant_data
copa-# from identity
copa-# join litigant_details on identity.actor_id = litigant_details.actor_id
copa-# join case_data on litigant_details.case_id = case_data.case_id and 
litigant_details.court_ori = case_data.court_ori
copa-# join court on identity.court_ori = court.id
copa-# where identity.court_ori = 'IL081025J' and full_name like 'JONES%' order 
by  full_name;
   QUERY 
PLAN

 Nested Loop  (cost=3.29..29482.22 rows=3930 width=86) (actual 
time=114.060..46001.480 rows=5052 loops=1)
   ->  Nested Loop  (cost=3.29..16193.27 rows=3820 width=112) (actual 
time=93.038..24584.275 rows=5052 loops=1)
 ->  Nested Loop  (cost=0.00..16113.58 rows=3820 width=113) (actual 
time=85.778..24536.489 rows=5052 loops=1)
   ->  Index Scan using name_speed on identity  (cost=0.00..824.72 
rows=3849 width=82) (actual time=50.284..150.133 rows=5057 loops=1)
 Index Cond: (((full_name)::text >= 'JONES'::character 
varying) AND ((full_name)::text < 'JONET'::character varying))
 Filter: (((court_ori)::text = 'IL081025J'::text) AND 
((full_name)::text ~~ 'JONES%'::text))
   ->  Index Scan using lit_actor_speed on litigant_details  
(cost=0.00..3.96 rows=1 width=81) (actual time=4.788..4.812 rows=1 loops=5057)
 Index Cond: (("outer".actor_id)::text = 
(litigant_details.actor_id)::text)
 ->  Materialize  (cost=3.29..3.30 rows=1 width=12) (actual 
time=0.002..0.003 rows=1 loops=5052)
   ->  Seq Scan on court  (cost=0.00..3.29 rows=1 width=12) (actual 
time=7.248..7.257 rows=1 loops=1)
 Filter: ('IL081025J'::text = (id)::text)
   ->  Index Scan using case_speed on case_data  (cost=0.00..3.46 rows=1 
width=26) (actual time=4.222..4.230 rows=1 loops=5052)
 Index Cond: ((("outer".court_ori)::text = (case_data.court_ori)::text) 
AND (("outer".case_id)::text = (case_data.case_id)::text))
 Total runtime: 46005.994 ms



> 
> More importantly, though, the planner seems to think the join of
> identity to litigant_details will only return 1 row, not 5000.
> Do you regularly vacuum analyze your tables?
> Just as a test, try running:
> set enable_nested_loop to off;
not quite acceptable
Total runtime: 221486.149 ms

> And then run EXPLAIN ANALYZE again, just to see if it is faster.
> 
> You probably need to increase some sta

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread Ron

At 01:55 PM 8/18/2005, John Arbash Meinel wrote:

Jeremiah Jahn wrote:

>here's an example standard query. Ireally have to make the first hit go
>faster. The table is clustered as well on full_name as well. 'Smith%'
>took 87 seconds on the first hit. I wonder if I set up may array wrong.
>I remeber see something about DMA access versus something else, and
>choose DMA access. LVM maybe?
>
>
It would be nice if you would format your queries to be a little bit
easier to read before posting them.
However, I believe I am reading it correctly, to say that the index scan
on identity is not your slow point. In fact, as near as I can tell, it
only takes 52ms to complete.

The expensive parts are the 4915 lookups into the litigant_details (each
one takes approx 4ms for a total of ~20s).
And then you do it again on case_data (average 3ms each * 4906 loops =
~15s).

How big are litigant_details and case_data?  If they can fit in RAM, 
preload them using methods like the "cat to /dev/null" trick and 
those table lookups will be ~100-1000x faster.  If they won't fit 
into RAM but the machine can be expanded to hold enough RAM to fit 
the tables, it's well worth the ~$75-$150/GB to upgrade the server so 
that the tables will fit into RAM.

If they can't be made to fit into RAM as atomic entities, you have a 
few choices:
A= Put the data tables and indexes on separate dedicated spindles and 
put litigant_details and case_data each on their own dedicated 
spindles.  This will lower seek conflicts.  Again it this requires 
buying some more HDs, it's well worth it.

B= Break litigant_details and case_data into a set of smaller tables 
(based on something sane like the first n characters of the primary key)
such that the smaller tables easily fit into RAM.  Given that you've 
said only 10GB/60GB is "hot", this could work very well.  Combine it 
with "A" above (put all the litigant_details sub tables on one 
dedicated spindle set and all the case_data sub tables on another 
spindle set) for added oomph.

C= Buy a SSD big enough to hold litigant_details and case_data and 
put them there.  Again, this can be combined with "A" and "B" above 
to lessen the size of the SSD needed.

So there is no need for preloading your indexes on the identity 
table.  It is definitely not the bottleneck.

So a few design bits, which may help your database.  Why is 
"actor_id" a text field instead of a number?
You could try creating an index on "litigant_details (actor_id, 
count_ori)" so that it can do just an index lookup, rather than an 
index+ filter.

Yes, that certainly sounds like it would be more efficient.

More importantly, though, the planner seems to think the join of 
identity to litigant_details will only return 1 row, not 5000.

Do you regularly vacuum analyze your tables?
Just as a test, try running:
set enable_nested_loop to off;
And then run EXPLAIN ANALYZE again, just to see if it is faster.

You probably need to increase some statistics targets, so that the
planner can design better plans.

> ->  Nested Loop  (cost=0.00..20411.83 rows=2 width=173) (actual 
time=12.891..38317.017 rows=4906 loops=1)
>->  Nested Loop  (cost=0.00..20406.48 rows=1 width=159)(actual 
time=12.826..23232.106 rows=4906 loops=1)
>   ->  Nested Loop  (cost=0.00..20403.18 rows=1 width=138) 
(actual time=12.751..22885.439 rows=4906 loops=1)
>   Join Filter: (("outer".case_id)::text = 
("inner".case_id)::text)
>   ->  Index Scan using name_speed on 
identity  (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538

> rows=4915 loops=1)
>Index Cond: (((full_name)::text >= 
'MILLER'::character varying) AND ((full_name)::text < 
'MILLES'::character varying))
>Filter: (((court_ori)::text = 
'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
>->  Index Scan using lit_actor_speed on 
litigant_details  (cost=0.00..3.96 rows=1 width=81) (actual

> time=4.631..4.635 rows=1 loops=4915)
> Index Cond: (("outer".actor_id)::text = 
(litigant_details.actor_id)::text)

> Filter: ('IL081025J'::text = (court_ori)::text)
> ->  Seq Scan on court  (cost=0.00..3.29 
rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906)

>  Filter: ('IL081025J'::text = (id)::text)
>  ->  Index Scan using case_speed on 
case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058

> rows=1 loops=4906)
>   Index Cond: (('IL081025J'::text 
= (case_data.court_ori)::text) AND ((case_data.case_id)::text =

 > ("outer".case_id)::text))

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread John Arbash Meinel

Jeremiah Jahn wrote:

>here's an example standard query. Ireally have to make the first hit go
>faster. The table is clustered as well on full_name as well. 'Smith%'
>took 87 seconds on the first hit. I wonder if I set up may array wrong.
>I remeber see something about DMA access versus something else, and
>choose DMA access. LVM maybe?
>  
>
It would be nice if you would format your queries to be a little bit
easier to read before posting them.
However, I believe I am reading it correctly, to say that the index scan
on identity is not your slow point. In fact, as near as I can tell, it
only takes 52ms to complete.

The expensive parts are the 4915 lookups into the litigant_details (each
one takes approx 4ms for a total of ~20s).
And then you do it again on case_data (average 3ms each * 4906 loops =
~15s).

So there is no need for preloading your indexes on the identity table.
It is definitely not the bottleneck.

So a few design bits, which may help your database.
Why is "actor_id" a text field instead of a number?
You could try creating an index on "litigant_details (actor_id,
count_ori)" so that it can do just an index lookup, rather than an index
+ filter.

More importantly, though, the planner seems to think the join of
identity to litigant_details will only return 1 row, not 5000.
Do you regularly vacuum analyze your tables?
Just as a test, try running:
set enable_nested_loop to off;
And then run EXPLAIN ANALYZE again, just to see if it is faster.

You probably need to increase some statistics targets, so that the
planner can design better plans.

>  ->  Nested Loop  (cost=0.00..20411.83 rows=2 width=173)
> (actual time=12.891..38317.017 rows=4906 loops=1)
>->  Nested Loop  (cost=0.00..20406.48 rows=1 width=159)
> (actual time=12.826..23232.106 rows=4906 loops=1)
>  ->  Nested Loop  (cost=0.00..20403.18 rows=1
> width=138) (actual time=12.751..22885.439 rows=4906 loops=1)
>Join Filter: (("outer".case_id)::text =
> ("inner".case_id)::text)
>->  Index Scan using name_speed on
> identity  (cost=0.00..1042.34 rows=4868 width=82) (actual
> time=0.142..52.538 rows=4915 loops=1)
>  Index Cond: (((full_name)::text >=
> 'MILLER'::character varying) AND ((full_name)::text <
> 'MILLES'::character varying))
>  Filter: (((court_ori)::text =
> 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
>->  Index Scan using lit_actor_speed on
> litigant_details  (cost=0.00..3.96 rows=1 width=81) (actual
> time=4.631..4.635 rows=1 loops=4915)
>  Index Cond: (("outer".actor_id)::text
> = (litigant_details.actor_id)::text)
>  Filter: ('IL081025J'::text =
> (court_ori)::text)
>  ->  Seq Scan on court  (cost=0.00..3.29 rows=1
> width=33) (actual time=0.053..0.062 rows=1 loops=4906)
>Filter: ('IL081025J'::text = (id)::text)
>->  Index Scan using case_speed on case_data 
> (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058 rows=1
> loops=4906)
>  Index Cond: (('IL081025J'::text =
> (case_data.court_ori)::text) AND ((case_data.case_id)::text =
> ("outer".case_id)::text))


John
=:->



signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread Jeremiah Jahn

here's an example standard query. Ireally have to make the first hit go
faster. The table is clustered as well on full_name as well. 'Smith%'
took 87 seconds on the first hit. I wonder if I set up may array wrong.
I remeber see something about DMA access versus something else, and
choose DMA access. LVM maybe?

explain analyze select distinct 
case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data,
 to_number(trim(leading case_data.type_code from trim(leading 
case_data.case_year from case_data.case_id)),'99') as seq from 
identity,court,litigant_details,case_data where identity.court_ori = 
litigant_details.court_ori and identity.case_id = litigant_details.case_id and 
identity.actor_id = litigant_details.actor_id and court.id = identity.court_ori 
and identity.court_ori = case_data.court_ori and case_data.case_id = 
identity.case_id  and identity.court_ori = 'IL081025J' and full_name like 
'MILLER%' order by  full_name;


  QUERY 
PLAN 
---
 Unique  (cost=20411.84..20411.91 rows=2 width=173) (actual 
time=38340.231..38355.120 rows=4906 loops=1)
   ->  Sort  (cost=20411.84..20411.84 rows=2 width=173) (actual 
time=38340.227..38343.667 rows=4906 loops=1)
 Sort Key: identity.full_name, case_data.case_category, 
identity.identity_id, court.name, litigant_details.case_id, 
case_data.case_year, identity.date_of_birth, 
litigant_details.assigned_case_role, litigant_details.court_ori, 
litigant_details.actor_id, case_data.type_code, case_data.subtype_code, 
litigant_details.impound_litigant_data, 
to_number(ltrim(ltrim((case_data.case_id)::text, (case_data.case_year)::text), 
(case_data.type_code)::text), '99'::text)
 ->  Nested Loop  (cost=0.00..20411.83 rows=2 width=173) (actual 
time=12.891..38317.017 rows=4906 loops=1)
   ->  Nested Loop  (cost=0.00..20406.48 rows=1 width=159) (actual 
time=12.826..23232.106 rows=4906 loops=1)
 ->  Nested Loop  (cost=0.00..20403.18 rows=1 width=138) 
(actual time=12.751..22885.439 rows=4906 loops=1)
   Join Filter: (("outer".case_id)::text = 
("inner".case_id)::text)
   ->  Index Scan using name_speed on identity  
(cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538 rows=4915 
loops=1)
 Index Cond: (((full_name)::text >= 
'MILLER'::character varying) AND ((full_name)::text < 'MILLES'::character 
varying))
 Filter: (((court_ori)::text = 
'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
   ->  Index Scan using lit_actor_speed on 
litigant_details  (cost=0.00..3.96 rows=1 width=81) (actual time=4.631..4.635 
rows=1 loops=4915)
 Index Cond: (("outer".actor_id)::text = 
(litigant_details.actor_id)::text)
 Filter: ('IL081025J'::text = (court_ori)::text)
 ->  Seq Scan on court  (cost=0.00..3.29 rows=1 width=33) 
(actual time=0.053..0.062 rows=1 loops=4906)
   Filter: ('IL081025J'::text = (id)::text)
   ->  Index Scan using case_speed on case_data  (cost=0.00..5.29 
rows=3 width=53) (actual time=3.049..3.058 rows=1 loops=4906)
 Index Cond: (('IL081025J'::text = 
(case_data.court_ori)::text) AND ((case_data.case_id)::text = 
("outer".case_id)::text))
 Total runtime: 38359.722 ms
(18 rows)

copa=> explain analyze select distinct 
case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data,
 to_number(trim(leading case_data.type_code from trim(leading 
case_data.case_year from case_data.case_id)),'99') as seq from 
identity,court,litigant_details,case_data where identity.court_ori = 
litigant_details.court_ori and identity.case_id = litigant_details.case_id and 
identity.actor_id = litigant_details.actor_id and court.id = identity.court_ori 
and identity.court_ori = c

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread John Arbash Meinel

Jeremiah Jahn wrote:

>On Wed, 2005-08-17 at 21:21 -0500, John A Meinel wrote:
>  
>
>>Jeremiah Jahn wrote:
>>
>>
>>>I just put together a system with 6GB of ram on a 14 disk raid 10 array.
>>>When I run my usual big painful queries, I get very little to know
>>>memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used
>>>most of the time. the new devel box sits at around 250MB.
>>>
>>>I've switched to an 8.0 system on the new devel box, but the .conf
>>>really didn't change. Index usage is the same. Something seems wrong and
>>>I'm not sure why.
>>>
>>>  
>>>
>>How big is your actual database on disk? And how much of it is actually
>>touched by your queries?
>>
>>
>The DB is about 60GB. About 10GB is actually used in real queries,
>versus get me this single record with this ID. I have a large query that
>finds court cases based on certain criteria that is name based. I get a
>full seq scan on the name table in about 7 seconds, This table has about
>6 million names (most being 'smith, something'). The index scan takes
>much less time of course, once it's been cached (somewhere but not
>apparently memory). The really query can take 60 seconds on a first run.
>And 1.3 seconds on a second run. I'm very happy with the cached results,
>just not really sure where that caching is happening since it doesn't
>show up as memory usage. I do know that the caching that happens seems
>to be independent of the DB. I can restart the DB and my speeds are
>still the same as the cached second query. Is there some way to
>pre-cache some of the tables/files on the file system? If I switch my
>query to search for 'jones%' instead of 'smith%', I take a hit. But if I
>then rerun the smith search, I still get cached speed. I only have two
>tables essentially names and events that have to do any real work ie.
>not very atomic data. I'd love to be able to force these two tables into
>a cache somewhere. This is a linux system (RHEL ES4) by the way. 
>  
>
I think what is happening is that *some* of the index pages are being
cached, just not all of them. Most indexes (if you didn't specify
anything special) are btree, so that you load the root page, and then
determine what pages need to be loaded from there. So the "jones%" pages
aren't anywhere near the "smith%" pages. And don't need to be loaded if
you aren't accessing them.

So the required memory usage might be smaller than you think. At least
until all of the index pages have been accessed.

The reason it is DB independent is because the OS is caching a file
access (you read a file, it keeps the old pages in RAM in case you ask
for it again).

Part of the trick, is that as you use the database, it will cache what
has been used. So you may not need to do anything. It should sort itself
out with time.
However, if you have to have cached performance as soon as your machine
reboots, you could figure out what files on disk represent your indexes
and tables, and then just "cat $files >/dev/null"
That should cause a read on those files, which should pull them into the
memory cache. *However* this will fail if the size of those files is
greater than available memory, so you may want to be a little bit stingy
about what you preload.
Alternatively, you could just write an SQL script which runs a bunch of
indexed queries to make sure all the pages get loaded.

Something like:
FOR curname IN SELECT DISTINCT name FROM users LOOP
  SELECT name FROM users WHERE name=curname;
END LOOP;

That should make the database go through the entire table, and load the
index for every user. This is overkill, and will probably take a long
time to execute. But you could do it if you wanted.

>>It seems that your tough queries might only be exercising a portion of
>>the database. If you really want to make memory usage increase try
>>something like:
>>find . -type f -print0 | xargs -0 cat >/dev/null
>>Which should read all the files. After doing that, does the memory usage
>>increase?
>>
>>
>>
>>>any thoughts,
>>>-jj-
>>>
>>>
>>>shared_buffers = 32768  # min 16, at least max_connections*2, 8KB 
>>>each
>>>work_mem = 2097151  # min 64, size in KB
>>>  
>>>
>>This seems awfully high. 2GB Per sort? This might actually be flushing
>>some of your ram, since it would get allocated and filled, and then
>>freed when finished. Remember, depending on what you are doing, this
>>amount can get allocated more than once per query.
>>
>>
>What's a good way to determine the optimal size?
>  
>

Practice. :) A few questions I guess...
How many concurrent connections are you expecting? How many joins does a
standard query have? How big are the joins?

In general, I would tend to make this a smaller number, so that the os
has more room to cache tables, rather than having big buffers for joins.
If someone is requesting a join that requires a lot of rows, I would
rather *that* query be slower, than impacting everyone else.
I would put it more with a maximum in the 20-100MB ra

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread Jeremiah Jahn

On Wed, 2005-08-17 at 21:21 -0500, John A Meinel wrote:
> Jeremiah Jahn wrote:
> > I just put together a system with 6GB of ram on a 14 disk raid 10 array.
> > When I run my usual big painful queries, I get very little to know
> > memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used
> > most of the time. the new devel box sits at around 250MB.
> >
> > I've switched to an 8.0 system on the new devel box, but the .conf
> > really didn't change. Index usage is the same. Something seems wrong and
> > I'm not sure why.
> >
> 
> How big is your actual database on disk? And how much of it is actually
> touched by your queries?
The DB is about 60GB. About 10GB is actually used in real queries,
versus get me this single record with this ID. I have a large query that
finds court cases based on certain criteria that is name based. I get a
full seq scan on the name table in about 7 seconds, This table has about
6 million names (most being 'smith, something'). The index scan takes
much less time of course, once it's been cached (somewhere but not
apparently memory). The really query can take 60 seconds on a first run.
And 1.3 seconds on a second run. I'm very happy with the cached results,
just not really sure where that caching is happening since it doesn't
show up as memory usage. I do know that the caching that happens seems
to be independent of the DB. I can restart the DB and my speeds are
still the same as the cached second query. Is there some way to
pre-cache some of the tables/files on the file system? If I switch my
query to search for 'jones%' instead of 'smith%', I take a hit. But if I
then rerun the smith search, I still get cached speed. I only have two
tables essentially names and events that have to do any real work ie.
not very atomic data. I'd love to be able to force these two tables into
a cache somewhere. This is a linux system (RHEL ES4) by the way. 
> 
> It seems that your tough queries might only be exercising a portion of
> the database. If you really want to make memory usage increase try
> something like:
> find . -type f -print0 | xargs -0 cat >/dev/null
> Which should read all the files. After doing that, does the memory usage
> increase?
> 
> >
> > any thoughts,
> > -jj-
> >
> >
> > shared_buffers = 32768  # min 16, at least max_connections*2, 8KB 
> > each
> > work_mem = 2097151  # min 64, size in KB
> 
> This seems awfully high. 2GB Per sort? This might actually be flushing
> some of your ram, since it would get allocated and filled, and then
> freed when finished. Remember, depending on what you are doing, this
> amount can get allocated more than once per query.
What's a good way to determine the optimal size?

> 
> > maintenance_work_mem = 819200   # min 1024, size in KB
> > max_fsm_pages = 8   # min max_fsm_relations*16, 6 bytes each
> > checkpoint_segments = 30# in logfile segments, min 1, 16MB each
> > effective_cache_size = 360   <-this is a little out of control, but 
> > would it have any real effect?
> 
> It should just tell the planner that it is more likely to have buffers
> in cache, so index scans are slightly cheaper than they would otherwise be.
> 
> > random_page_cost = 2# units are one sequential page fetch cost
> > log_min_duration_statement = 1 # -1 is disabled, in milliseconds.
> > lc_messages = 'C'   # locale for system error message strings
> > lc_monetary = 'C'   # locale for monetary formatting
> > lc_numeric = 'C'# locale for number formatting
> > lc_time = 'C'   # locale for time formatting
> >
> 
> John
> =:->
-- 
"Now this is a totally brain damaged algorithm.  Gag me with a
smurfette."
-- P. Buhr, Computer Science 354


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread Jeff Trout



On Aug 17, 2005, at 10:11 PM, Jeremiah Jahn wrote:

I just put together a system with 6GB of ram on a 14 disk raid 10  
array.

When I run my usual big painful queries, I get very little to know
memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used
most of the time. the new devel box sits at around 250MB.



Is the system performing fine?  Are you touching as much data as the  
production box?


If the system is performing fine don't worry about it.


work_mem = 2097151  # min 64, size in KB


This is EXTREMELY high.  You realize this is the amount of memory  
that can be used per-sort and per-hash build in a query? You can end  
up with multiples of this on a single query.   If you have some big  
queries that are run infrequently have them set it manually.


effective_cache_size = 360   <-this is a little out of  
control, but would it have any real effect?


This doesn't allocate anything - it is a hint to the planner about  
how much data it can assume is cached.


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] extremly low memory usage

2005-08-17 Thread John A Meinel

Jeremiah Jahn wrote:
> I just put together a system with 6GB of ram on a 14 disk raid 10 array.
> When I run my usual big painful queries, I get very little to know
> memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used
> most of the time. the new devel box sits at around 250MB.
>
> I've switched to an 8.0 system on the new devel box, but the .conf
> really didn't change. Index usage is the same. Something seems wrong and
> I'm not sure why.
>

How big is your actual database on disk? And how much of it is actually
touched by your queries?

It seems that your tough queries might only be exercising a portion of
the database. If you really want to make memory usage increase try
something like:
find . -type f -print0 | xargs -0 cat >/dev/null
Which should read all the files. After doing that, does the memory usage
increase?

>
> any thoughts,
> -jj-
>
>
> shared_buffers = 32768  # min 16, at least max_connections*2, 8KB each
> work_mem = 2097151  # min 64, size in KB

This seems awfully high. 2GB Per sort? This might actually be flushing
some of your ram, since it would get allocated and filled, and then
freed when finished. Remember, depending on what you are doing, this
amount can get allocated more than once per query.

> maintenance_work_mem = 819200   # min 1024, size in KB
> max_fsm_pages = 8   # min max_fsm_relations*16, 6 bytes each
> checkpoint_segments = 30# in logfile segments, min 1, 16MB each
> effective_cache_size = 360   <-this is a little out of control, but 
> would it have any real effect?

It should just tell the planner that it is more likely to have buffers
in cache, so index scans are slightly cheaper than they would otherwise be.

> random_page_cost = 2# units are one sequential page fetch cost
> log_min_duration_statement = 1 # -1 is disabled, in milliseconds.
> lc_messages = 'C'   # locale for system error message strings
> lc_monetary = 'C'   # locale for monetary formatting
> lc_numeric = 'C'# locale for number formatting
> lc_time = 'C'   # locale for time formatting
>

John
=:->

signature.asc
Description: OpenPGP digital signature

[PERFORM] extremly low memory usage

2005-08-17 Thread Jeremiah Jahn

I just put together a system with 6GB of ram on a 14 disk raid 10 array.
When I run my usual big painful queries, I get very little to know
memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used
most of the time. the new devel box sits at around 250MB. 

I've switched to an 8.0 system on the new devel box, but the .conf
really didn't change. Index usage is the same. Something seems wrong and
I'm not sure why. 


any thoughts,
-jj-


shared_buffers = 32768  # min 16, at least max_connections*2, 8KB each
work_mem = 2097151  # min 64, size in KB
maintenance_work_mem = 819200   # min 1024, size in KB
max_fsm_pages = 8   # min max_fsm_relations*16, 6 bytes each
checkpoint_segments = 30# in logfile segments, min 1, 16MB each
effective_cache_size = 360   <-this is a little out of control, but 
would it have any real effect?
random_page_cost = 2# units are one sequential page fetch cost
log_min_duration_statement = 1 # -1 is disabled, in milliseconds.
lc_messages = 'C'   # locale for system error message strings
lc_monetary = 'C'   # locale for monetary formatting
lc_numeric = 'C'# locale for number formatting
lc_time = 'C'   # locale for time formatting



-- 
"Now this is a totally brain damaged algorithm.  Gag me with a
smurfette."
-- P. Buhr, Computer Science 354


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

40 matches

Mail list logo